Post on 30-Dec-2015
description
Stephen Pickles <stephen.pickles@man.ac.uk>
http://www.realitygrid.orghttp://www.realitygrid.org/TeraGyroid.html
UKLight Town Meeting, NeSC, Edinburgh, 9/9/2004
TeraGyroidTeraGyroid
HPC Applications ready for UKLight
UKLight Town Meeting, NeSC, 9/9/20042
The TeraGyroid Project
Funded by EPSRC (UK) & NSF (USA) to join the UK e-Science Grid and US TeraGrid
– application from RealityGrid, a UK e-Science Pilot Project– 3 month project including work exhibited at SC’03 and SC Global,
Nov 2003– thumbs up from TeraGrid mid-September, funding from EPSRC
approved later
Main objective was to deliver high impact science which it would not be possible to perform without the combined resources of the US and UK grids
Study of defect dynamics in liquid crystalline surfactant systems using lattice-Boltzmann methods
– featured world’s largest Lattice Boltzmann simulation– 1024^3 cell simulation of gyroid phase demands terascale computing
• hence “TeraGyroid”
UKLight Town Meeting, NeSC, 9/9/20043
(realtime) UDP
realtime)-(non TCP
realtime)-(near TCP
Networking
visualization engine storage
HPC engineHPC engine
checkpoint files
visualization data
compressed video
steering: control and status
UKLight Town Meeting, NeSC, 9/9/20044
LB3D: 3-dimensional Lattice-Boltzmann simulations
LB3D code is written in Fortran90 and parallelized using MPI
Scales linearly on all available resources (Lemieux, HPCx, CSAR,
Linux/Itanium II clusters)
Data produced during a single run can exceed 100s of gigabytes to terabytes
Simulations require supercomputers
High end visualization hardware (eg. SGI Onyx, dedicated viz clusters) and parallel rendering software (e.g. VTK)
needed for data analysis
3D datasets showing snapshots from a simulation of spinodal decomposition: A binary mixture of water and oil phase separates. ‘Blue’ areas denote high water densities and ‘red’ visualizes the interface between both fluids.
UKLight Town Meeting, NeSC, 9/9/20045
Computational Steering ofLattice Boltzmann Simulations
LB3D instrumented for steering using the RealityGrid steering library.
Malleable checkpoint/restart functionality allows ‘rewinding’ of simulations and run-time job migration across architectures.
Steering reduces storage requirements because the user can adapt data dumping frequencies.
CPU time can be saved because users do not have to wait for jobs to be finished if they can already see that nothing relevant is happening.
Instead of doing “task farming”, parameter searches are accelerated by “steering” through parameter space.
Analysis time is significantly reduced because less irrelevant data is produced.
Applied to study of gyroid mesophase of amphiphilic
liquid crystals at unprecedented space and
time scales
UKLight Town Meeting, NeSC, 9/9/20046
Parameter space exploration
Initial condition: Random water/ surfactant mixture.
Self-assembly starts.
Rewind and restart from checkpoint.
Lamellar phase: surfactant bilayers between water layers.
Cubic micellar phase, low surfactant density gradient.
Cubic micellar phase, high surfactant density gradient.
UKLight Town Meeting, NeSC, 9/9/20047
Strategy
Aim: use federated resources of US TeraGrid and UK e-Science Grid to accelerate scientific process
Rapidly map out parameter space using large number of independent “small” (128^3) simulations
– use job cloning and migration to exploit available resources and save equilibration time
Monitor their behaviour using on-line visualization Hence identify parameters for high-resolution simulations on HPCx
and Lemieux– 1024^3 on Lemieux (PSC) – takes 0.5 TB to checkpoint!– create initial conditions by stacking smaller simulations with periodic boundary
conditions
Selected 128^3 simulations were used for long-time studies All simulations monitored and steered by geographically distributed
team of computational scientists
UKLight Town Meeting, NeSC, 9/9/20048
The Architecture of Steering
Steering client
Simulation
Steering library
VisualizationVisualization
Registry
Steering GS
Steering GS
connect
publish
find
bind
data transfer
(Globus-IO)
publish
bind
Client
Steering library
Steering library
Steering library Display
Display
Display
components start independently and
attach/detach dynamically
remote visualization through SGI VizServer, Chromium, and/or streamed to Access Grid
multiple clients: Qt/C++, .NET on PocketPC, GridSphere Portlet (Java)
OGSI middle tier
•Computations run at HPCx, CSAR, SDSC, PSC and NCSA•Visualizations run at Manchester, UCL, Argonne, NCSA, Phoenix•Scientists in 4 sites steer calculations, collaborating via Access Grid•Visualizations viewed remotely•Grid services run anywhere
UKLight Town Meeting, NeSC, 9/9/20049
SC Global ’03 Demonstration
UKLight Town Meeting, NeSC, 9/9/200410
TeraGyroid Testbed
VisualizationComputation
Starlight (Chicago)
Netherlight (Amsterdam)
BT provision
PSC
ANL
NCSA
Phoenix
Caltech
SDSC
UCL
Daresbury
Manchester
SJ4MB-NG
Network PoP
Access Grid nodeService Registry
production network
Dual-homed system
10 Gbps
2 x 1 Gbps
UKLight Town Meeting, NeSC, 9/9/200411
Trans-AtlanticNetwork
Collaborators: Manchester Computing Daresbury Laboratory
Networking Group MB-NG and UKERNA UCL Computing Service BT SurfNET (NL) Starlight (US) Internet-2 (US)
UKLight Town Meeting, NeSC, 9/9/200412
TeraGyroid:Hardware Infrastructure
Computation (using more than 6000 processors) including: HPCx (Daresbury), 1280 procs IBM Power4 Regatta, 6.6 Tflops peak, 1.024 TB Lemieux (PSC), 3000 procs HP/Compaq, 3TB memory, 6 Tflops peak TeraGrid Itanium2 cluster (NCSA), 256 procs, 1.3 Tflops peak TeraGrid Itanium2 cluster (SDSC), 256 procs, 1.3 Tflops peak Green (CSAR), SGI Origin 3800, 512 procs, 0.512 TB memory (shared) Newton (CSAR), SGI Altix 3700, 256 Itanium 2 procs, 384GB memory (shared)Visualization: Bezier (Manchester), SGI Onyx 300, 6xIR3, 32procs Dirac (UCL), SGI Onyx 2, 2xIR3, 16 procs SGI loan machine, Phoenix, SGI Onyx 1xIR4, 1xIR3, commissioned on site TeraGrid Visualization Cluster (ANL), Intel Xeon SGI Onyx (NCSA)Service Registry: Frik (Manchester), Sony Playstation2Storage: 20 TB of science data generated in project 2 TB moved to long term storage for on-going analysis - Atlas Petabyte Storage System
(RAL)Access Grid nodes at Boston University, UCL, Manchester, Martlesham, Phoenix (4)
UKLight Town Meeting, NeSC, 9/9/200413
Network lessons
Less than three weeks to debug networks– applications people and network people nodded wisely but didn’t understand each
other– middleware such as GridFTP is infrastructure to applications folk, but an application
to network folk– rapprochement necessary for success
Grid middleware not designed with dual-homed systems in mind– HPCx, CSAR (Green) and Bezier are busy production systems– had to be dual homed on SJ4 and MB-NG– great care with routing– complication: we needed to drive everything from laptops that couldn’t see the MB-
NG network
Many other problems encountered– but nothing that can’t be fixed once and for all given persistent infrastructure
UKLight Town Meeting, NeSC, 9/9/200414
Measured Transatlantic Bandwidths during SC’03
UKLight Town Meeting, NeSC, 9/9/200415
TeraGyroid: Summary
Real computational science...– Gyroid mesophase of amphiphilic
liquid crystals– Unprecedented space and time
scales – investigating phenomena previously
out of reach
...on real Grids...– enabled by high-bandwidth
networks
...to reduce time to insight
Interfacial Surfactant Density
Dislocations
UKLight Town Meeting, NeSC, 9/9/200416
TeraGyroid: Collaborating Organisations
Our thanks to hundreds of individuals at:...
Argonne National Laboratory (ANL)Boston University
BTBT ExactCaltech
CSCComputing Services for Academic Research (CSAR)
CCLRC Daresbury LaboratoryDepartment of Trade and Industry (DTI)Edinburgh Parallel Computing Centre
Engineering and Physical Sciences Research Council (EPSRC)Forschungzentrum Juelich
HLRS (Stuttgart)HPCxIBM
Imperial College LondonNational Center for Supercomputer Applications (NCSA)
Pittsburgh Supercomputer CenterSan Diego Supercomputer Center
SCinetSGI
SURFnetTeraGrid
Tufts University, BostonUKERNA
UK Grid Support CentreUniversity College London
University of EdinburghUniversity of Manchester
ANL
http://www.realitygrid.orghttp://www.realitygrid.org/TeraGyroid.html
The TeraGThe TeraGyyrrooid Experimentid Experiment
S. M. Pickles1, R. J. Blake2, B. M. Boghosian3, J. M. Brooke1,J. Chin4, P. E. L. Clarke5, P. V. Coveney4,
N. González-Segredo4, R. Haines1, J. Harting4, M. Harvey4,M. A. S. Jones1, M. Mc Keown1, R. L. Pinning1,
A. R. Porter1, K. Roy1, and M. Riding1.
1. Manchester Computing, University of Manchester2. CLRC Daresbury Laboratory, Daresbury3. Tufts University, Massachusetts4. Centre for Computational Science, University College London5. Department of Physics & Astronomy, University College London
New Application at AHM2004New Application at AHM2004
Philip Fowler, Peter Coveney, Shantenu Jha and Shunzhou Wan
UK e-Science All Hands Meeting31 August – 3 September 2004
“Exact” calculation of peptide-protein binding energies by steered thermodynamic integration using high-performance computing grids.
UKLight Town Meeting, NeSC, 9/9/200419
Why are we studying this system?
Measuring binding energies are vital for e.g. designing new drugs.
Calculating a peptide-protein binding energy can take weeks to months.
We have developed a grid-based
method to accelerate this process
To compute Gbind during the AHM 2004 conference i.e. in less than 48 hours
Using federated resources of UK National Grid Service and US TeraGrid
UKLight Town Meeting, NeSC, 9/9/200420
lambda
H
=0.1
=0.2
=0.3
…
=0.9
Starting conformation
H
t
Seed successive simulations
(10 sims, each 2ns)
Check for convergence
Combine and calculate integraltime
Use steering to launch, spawn and terminate - jobs
Run each independent job on the Grid
Thermodynamic Integration on Computational Grids
UKLight Town Meeting, NeSC, 9/9/200421
monitoring
checkpointing
steering and control
UKLight Town Meeting, NeSC, 9/9/200422
We successfully ran many simulations…
This is the first time we have completed an entire calculation.– Insight gained will help us improve the throughput.
The simulations were started at 5pm on Tuesday and the data was collated at 10am Thursday.
26 simulations were run At 4.30pm on Wednesday, we had nine simulations in
progress (140 processors)– 1x TG-SDSC, 3x TG-NCSA, 3x NGS-Oxford, 1x NGS-Leeds, 1x NGS-RAL
We simulated over 6.8ns of classical molecular dynamics in this time
UKLight Town Meeting, NeSC, 9/9/200423
Very preliminary results
Thermodynamic Integrations
-200
-100
0
100
200
300
400
0 0.2 0.4 0.6 0.8 1
lambda
dE/d
l
dppo
We expect our value to improve with further analysis around the endpoints.
G
(kcal/mol)
Experiment -1.0 ± 0.3
“Quick and dirty” analysis* -9 to -12* - as at 41 hours
UKLight Town Meeting, NeSC, 9/9/200424
Conclusions
We can harness today’s grids to accelerate high-end computational science
On-line visualization and job migration require high bandwidth networks
Need persistent network infrastructure– else set up costs are too high
QoS: Would like ability to reserve bandwidth– and processors, graphics pipes, AG rooms, virtual venues, nodops... (but
that’s another story)
Hence our interest in UKLight