VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London.
-
date post
19-Dec-2015 -
Category
Documents
-
view
219 -
download
1
Transcript of VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London.
VPH NoE, 2010
Accessing European Computing Platforms
Stefan Zasada
University College London
VPH NoE, 2010
Contents
• Introduction to grid computing• What resources are available to VPH?• How to gain access to computational resources• Deploying applications on EU resources• Making things easier with VPH ToolKit components• Coupling multiscale simulations• AHE demonstration• Going further
VPH NoE, 2010
What is a Grid?
Allow resource sharing between multiple, distinct organisations
Cross administrative domains
Security is important – only authorized users are able to use resources
Differs from cluster computing
“Computing as a utility” – Clouds too?
VPH NoE, 2010
Resources Available to VPH Researchers
DEISAHeterogeneous HPC infrastructure currently formed by
eleven European national supercomputing centres that are interconnected by a dedicated high performance network.
EGI/EGEEConsists of 41,000 CPU available to users 24 hours a
day, 7 days a week, in addition to about 5 PB disk (5 million Gigabytes) + tape MSS of storage, and maintains 100,000 concurrent jobs.
EGI includes national grid initiatives Clouds – Amazon EC3 etc
VPH NoE, 2010
Compute Allocations for VPH
Conducted a survey of all VPH-I projects and Seed EPs to assess their computational requirements.
This was used to prepare an application to the DEISA Virtual Communities programme for an allocation of resources to support the work of all VPH-I projects, managed by the NoE.
We have set up these VPH VC allocations and run them for the past 2 years – now extended until the end of DEISA in 2011
VPH NoE, 2010
EGEE/EGIEGEE has transformed into EGI and will start to
integrate national grid infrastructuresConsists of 41,000 CPU available to users 24 hours a
day, 7 days a week, in addition to about 5 PB disk (5 million Gigabytes) + tape MSS of storage, and maintains 100,000 concurrent jobs.
EGEE/EGI have agreed to provide access to VPH researchers through their BioMed VO. Ask for more details if you would like to use this.
If you would like access contact [email protected]
VPH NoE, 2010
DEISA Project Allocations
+ euHeart in second wave, and other non-VPH EU projects
VPH was awarded 2 million standard DEISA core hours for 2009,
renewed for 2010 and 2011 • HECToR (Cray, UK)
• SARA (IBM Power 6, Netherlands)
VPH NoE, 2010
Choosing the right resources for your application
Parallel/MPI/HPC applications:
Use DEISA (some DEISA sites have policies to promote large-scale computing.
Embarrassingly parallel/task farming
Use EGI/National Grid Initiative
Split multiscale application between appropriate resources
VPH NoE, 2010
Uniform access to resources at all scales
Middleware provides underlying computational infrastructure infrastructure
Allows the different resources to be used seamlessly (grid, cloud etc).
Automates machine to machine communication with minimal user input
Virtualises resources – user need know nothing of the underlying operating systems etc
Requires X.509 certificate!
VPH NoE, 2010
Accessing DEISA/EGEEDEISA
Provides access through Unicore/SSH (some sites have Globus)
Applications compiled at terminal, then submitted to batch queue or via Unicore
EGI/EGEEAccessed via gLite, often submit job to compile
National Grid InitiativeShould run gLite if part of EGI
May run Globus/Unicore
VPH NoE, 2010
The Application Hosting Environment
Based on the idea of applications as web services
Lightweight hosting environment for running unmodified applications on grid resources (DEISA, TeraGrid) and on local resources (departmental clusters)
Community model: expert user installs and configures an application and uses the AHE to share it with others
Simple clients with very limited dependencies
Non-invasive – no need for extra software to be installed on resources
http://toolkit.vph-noe.eu/home/tools/compute-resources/tools-for-accessing-compute-resources/ahe.html
VPH NoE, 2010
Motivation for the AHEProblems with current middleware solutions:
Difficult for an end user to configure and/or install
Dependent on lots of supporting software also being installed
Require modified versions of common libraries
Require non-standard ports to be opened on firewall
We have access to many international grids running different
middleware stacks, and need to be able to seamlessly
interoperate between them
VPH NoE, 2010
AHE Functionality
Launch simulations on multiple grid resources
Single interface to monitor and manipulate all simulations launched on the various grid resource
Run simulations without manually having to stage files and GSISSH in
Retrieve files to local machine when simulation is done
Can use a combination of different clients – PDA, desktop GUI, command line
VPH NoE, 2010
AHE Design Constraints
Client does not have require any other middleware installed locally
Client is NAT'd and firewalled
Client does not have to be a single machine
Client needs to be able to upload and download files but doesn’t have local
installation of GridFTP
Client doesn’t maintain information on how to run the application
Client doesn’t care about changes to the backend resources
VPH NoE, 2010
AHE 2.0 Provides a Single Platform to:
Launch applications on Unicore and Globus 4 grids by acting as an OGSA-BES and GT4 client
Create advanced reservations using HARC and launch applications in to the reservation
Steer applications using the RealityGrid steering API or GENIUS project steering system
Launch cross-site MPIg applications
AHE 3 plans to support:
SPRUCE urgent job submission
Lightweight certificate sharing mechanisms
VPH NoE, 2010
Virtualizing ApplicationsApplication Instance/Simulation is central entity; represented
by a stateful WS-Resource. State properties include:simulation owner
target grid resource
job ID
simulation input files and urls
simulation output files and urls
job status
VPH NoE, 2010
AHE Reach
DEISA
UK NGS
Leeds
Manchester
Oxford
RAL
HPCx
NGS
Local resources
GridSAM
Globus
UNICORE
TeraGrid
Globus
VPH NoE, 201018
Cross site access
18
Scientific Application (e.g. bloodflow simulation HemeLB)
Application Hosting Environment (AHE) as scientific-specific Client technology
Globus GRAM 4 Interface
TeraGrid Globus 4 stack
Security Policies
with gridmaps
GLUE-based
information
GridFTP Interface
HPC Ressource
within NGS
HPC Ressource
within DEISA
Storage Resources
(e.g. tape archives, robots)
Common Environment
massively parallel jobs with
HemeLB
Common Environment
DEISA Modules with HemeLB
massively parallel jobs with HemeLB
OGSA-BES Interface
Grid Middleware
UNICORE with BES
Security Policies
(e.g. XACML)
GLUE-based
information
Stefan Zasada, Steven Manos, Morris Riedel, Johannes Reetz, Michael Rambadt et al., Preparation for the Virtual Physiological Human (VPH) project that requires interoperability of numerous Grids
VPH NoE, 2010
AHE DeploymentAHE Server
Released as a VirtualBox VM image – download image file and import to VirtualBox
All required services, containers etc pre-configured
User just needs to configure services for their target resources/applications
AHE Client
User’s machine must have Java installed
User downloads and untars client package
Imports X.509 certificate into Java keystore using provided script
Configures client with endpoints of AHE services supplied by expert user
VPH NoE, 2010
Hosting a New Application
Expert user must:
Install and configure application on all resources on which it is being shared
Create an XSLT template for the application (easily cloned from exiting template)
Add the application to the RMInfo.xml file
Run a script to reread the configuration
Documentation covers whole process of deploying AHE & applications on NGS and TeraGrid
VPH NoE, 2010
Authentication via local credentials
A designated individual puts a single certificate into a credential repository controlled by our new “gateway” service.
User uses a local authentication service to authenticate to our gateway service. Our gateway service provides a session key (not shown) to our modified AHE client
and our modified AHE Server to enable the AHE client to authenticate to the AHE Server.
Our gateway service obtains a proxy certificate from its credential repository as necessary and gives it to our modified AHE Server to interact with the grid.
User now has no certificate interaction. Private key of the certificate is never exposed to the user.
User
Credential Repository
Our “gateway” service
Grid Middleware Computational Grid
PI, sysadmin, …
certificate
Modified AHE Server
using our modified AHE client and a
local authentication service
Audited Credential Delegation
VPH NoE, 2010
ProjectName CertificateResourceName Certificate
Proxy KeyCertificate Key
ResourceName ProjectName
Proxy UserID
Credential Repository
DB_LAS
u1,pw1
u2,pw2
u3,pw3u4,pw4
Local Database Authentication Server Authorization Module
Paramterized Role Based Access Control
Gateway Service
Trusted CA Revoked CA
Role [Tasks]UserID [Role]
u1,pw1
ACD Architecture
RolePermission: Role [Task]UserRole: UserID [Role]
Recent AHE + ACD usability study found combined solution to be much more usable than other interfaces
VPH NoE, 2010
Deploying multiscale applications
Loosely coupled:• Codes launched sequentially – output of one app is input to
next
• Could be distributed across multiple resources
• Perl scripts
• GSEngine
Tightly coupled:• Codes launched simulations
• Could be distributed across multiple resources
• RealityGrid Steering API
• MPIg
VPH NoE, 2010
Workflow ToolsMany tools exist to allow users to connect up
applications running on a grid in to workflow:Taverna, OMII-BPEL, Triana, GSEngine, Scripts
Most workflow tools interface to and orchestrate web services
Workflows can also include data access, data processing and analysis services
Can assist with connecting different VPH models together
VPH NoE, 2010
Constructing workflows with the AHEBy calling command line clients from Perl script complex workflows
can be achieved
Easily create chained or ensemble simulations
For example jobs can be chained together:
ahe-prepare prepare a new simulation for the first step
ahe-start start the step
ahe-monitor poll until step complete
ahe-getoutput download output files
repeat for next step
VPH NoE, 2010
Constructing a workflowahe-prepare
ahe-start
ahe-monitor
ahe-getoutput
VPH NoE, 2010
Computational TechniquesEnsemble MD is suited for HPC GRID
•Simulate each system many times from same starting position•Each run has randomized atomic energies fitting a certain temperature•Allows conformational sampling
Start ConformationSeries of Runs End Conformations
Cx C2
C1
C4
C3
Equilibration Protocolseq1 eq2 eq3 eq4 eq5 eq6 eq7 eq8
Launch simultaneous runs
(60 sims, each 1.5 ns)
VPH NoE, 2010
GSEngineGSEngine is a workflow orchestration engine developed by the
ViroLab projectCan be used to orchestrate applications launched by AHEIt allows services to be orchestrated using both point and click and
scripting interfacesWorkflows stored in a repository and shared between users Many of the aims of ViroLab similar to VPH-I projects, so GSEngine
will be useful hereIncluded in VPH ToolKit: http://toolkit.vph-noe.eu/home/tools/compute-
resources/workflow/gsengine.html
VPH NoE, 2010
Closing the loop between simulation and user
Monitoring a running simulation:
Pause, resume or stop if required
Altering parameters in a running simulation
Receive feedback via on-line visualization
Restarting a simulation from a known checkpoint
Reproducibility, provenance
Computational Steering
RenderSimulateControl Visualize
VPH NoE, 2010
AHE/ReG Steering Integration
Simulation
Steering library
VisualizationVisualization
Registry
Steering WS
Steering WS
connect
register
bind
data transfer
(sockets/files)
bind
Steering library
Steering library
AHE launch
start
Extended
AHE Client
launch
Slide adapted from Andrew Porter
VPH NoE, 2010
Cross-site Runs with MPI-g
MPI-g has been designed to run allow single applications run across multiple machines
Some problems won’t fit on a single machine, and require the RAM/processors of multiple machines on the grid.
MPI-g allows for jobs to be turned around faster by using small numbers of processors on several machines - essential for clinician
VPH NoE, 2010
MPI-g Requires Co-Allocation
We can reserve multiple resources for specified time periods
Co-allocation is useful for meta-computing jobs using MPIg, viz and for workflow applications.
We use HARC - Highly Available Robust Co-scheduler (developed by Jon Maclaren at LSU).
Slide courtesy Jon Maclaren
VPH NoE, 2010
HARC
HARC provides a secure co-allocation serviceMultiple Acceptors are used
Works well provided a majority of Acceptors stay alive
Paxos Commit keeps everything in sync
Gives the (distributed) service high availability
Deployment of 7 acceptors --> Mean Time To Failure ~ years
Transport-level security using X.509 certificates
HARC is a good platform on which to build portals/other servicesXML over HTTPS - simplerthan SOAP services
Easy to interoperate with
Very easy to use with the Java Client API
VPH NoE, 2010
Multiscale Applications on European
e-Infrastructures (MAPPER)
VPHFusion
Computional Biology
MaterialScience
Engineering
MAPPER
Distributed Multiscale Computing Needs
Partners: UvA, UCL, UU, PSC, Cyfronet, LMU, UNIGE, Chalmers, MPG
€2.5M, starting 1st October for
3 years
VPH NoE, 2010
MAPPER Infrastructure
VPH NoE, 2010
Going further1. Download and install AHE Server:
Download: http://www.realitygrid.org/AHE
Quick start guide: http://wiki.realitygrid.org/wiki/AHE
2. Host a new application (e.g. /bin/date)Generic app tutorial: http://wiki.realitygrid.org/wiki/AHE
3. Modify these example workflow scripts to run your application:
http://www.realitygrid.org/AHE/training/ahe-workflow.tgz