Grid Computing from a solid past to a bright future? David Groep NIKHEF DataGrid and VL group...
-
date post
18-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Grid Computing from a solid past to a bright future? David Groep NIKHEF DataGrid and VL group...
Grid Computing
from a solid past to a bright future?
David GroepNIKHEF DataGrid and VL group
2003-03-14
Grid – more than a hype?
Imagine that you could plug your computer into the wall and have direct access to huge computing resources immediately, just as you plug in a lamp to get instant light. …
Far from being science-fiction, this is the idea the XXXXXX project is about to make into reality.…
from a project brochure in 2001
• Grids and their (science) applications• Origins of the grid• What makes a Grid?
• Grid implementations today• New standards
• Dutch dimensions
Grid – a visionThe GRID: networked data processing centres and ”middleware” software as the “glue” of resources.
Researchers perform their activities regardless geographical location, interact with colleagues, share and access data
Scientific instruments and experiments provide huge amount of data
Communities and Apps
ENVISAT• 10 instruments on board10 instruments on board• 200 Mbps data rate to ground200 Mbps data rate to ground• 400 Tbytes data archived/year400 Tbytes data archived/year• ~100 `standard’ products~100 `standard’ products• 10+ dedicated facilities in Europe10+ dedicated facilities in Europe
• ~700 approved science user projects~700 approved science user projects
• 10 instruments on board10 instruments on board• 200 Mbps data rate to ground200 Mbps data rate to ground• 400 Tbytes data archived/year400 Tbytes data archived/year• ~100 `standard’ products~100 `standard’ products• 10+ dedicated facilities in Europe10+ dedicated facilities in Europe
• ~700 approved science user projects~700 approved science user projects
http://www.esa.int/
Added value for EO
• enhance the ability to access high level products
• allow reprocessing of large historical archives
• data fusion and cross-validation, …
Physics @ CERN• LHC particle accellerator
• operational in 2007
• 5-10 Petabyte per year
• 150 countries
• > 10000 Users
• lifetime ~ 20 years
level 1 - special hardware
40 MHz (40 TB/sec)
level 2 - embeddedlevel 3 - PCs
75 KHz (75 GB/sec)5 KHz (5 GB/sec)100 Hz(100 MB/sec)data recording &
offline analysis
The Need for Grids: LHC
http://www.cern.ch/
And More …
•For access to data
–Large network bandwidth to access computing centers
–Support of Data banks replicas (easier and faster
mirroring)
–Distributed data banks
•For interpretation of data
–GRID enabled algorithmsBLAST on distributed data banks, distributed data mining
Bio-informatics
And even more …
• financial services, life sciences, strategy evaluation, …
• instant immersive teleconferencing
• remote experimentation
• pre-surgical planning and simulation
Why is the Grid successful?
• Applications need large amounts of data or computation
• Ever larger, distributed user community• Network grows faster than compute power/storage
Inter-networking systems
• Continuous growth (now ~ 180 million hosts)• Many protocols and APIs (~3500 RFCs)• Focus on heterogeneity (and security)
http://www.caida.org/
http://www.isc.org/
Remote Service
• RPC proved hugely successful within domains– Network Information System (YP)– Network File System– Typical client-server stuff…
• CORBA – also intra-domain– Extension of RPC to OO design model– Diversification
• Web Services – venturing in the inter org. domain– Standard service descriptions and discovery– Common syntax (XML/SOAP)
Grid beginnings - Systems
• distributed computing research• Gigabit network test beds• Meta-supercomputing (I-WAY)• Condor ‘flocking’
GUSTO meta-computing test bed in 1999
Grid beginnings - Apps
• Solve problems using systems in one ‘domain’– parameter sweeps on batch clusters– PIAF for (HE) physics analysis– …
• Solvers using systems in multiple domains– SETI@home– …
• Ready for the next step …
What is the Grid about?
Resource sharing and coordinated problem solving
in dynamic multi-institutional virtual organisations
Virtual Organisation (VO):
A set of individuals or organisations, not under single hierarchical control, temporarily joining forces to solve a particular problem at hand, bringing to the collaboration a subset of their resources, sharing those at their discretion and each under their own conditions.
What makes a Grid?
Coordinates resources not subject to central control …– More than cluster & centralised distributed computing– Security, AAA, billing&payment, integrity, procedures
… using standard, open protocols …– More than single-purpose solutions– Requires interoperability, standards body, multiple
implementations
… to deliver non-trivial QoS.– Sum more than individual components (e.g. single sign-
on, transparency)
Ian Foster in Grid Today, 2002
Grid Architecture (v1)
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
Protocol Layers & Bodies
PhysicalPhysical
Data LinkData Link
NetworkNetwork
TransportTransport
SessionSession
PresentationPresentation
ApplicationApplication
Standards body: IEEE
Standards body: IETF
Standards bodies: GGFW3C
OASIS
Application
Fabric
Connectivity
Resource
Collective
Internet
Transport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
• Globus Project started 1997• Focus on research only• Used and extended by
many other projects• Toolkit `bag-of-services' approach –
not a complete architecture
• Several middleware projects:– EU DataGrid – production focus– CrossGrid, GridLAB, DataTAG, PPDG, GriPhyN– Condor– In NL: ICES/KIS Virtual Lab, VL-E
Grid Middleware
http://www.globus.org/
http://www.edg.org/
http://www.vl-e.nl/
Grid Protocols Today
• Use common Grid Security Infrastructure:– Extensions to TLS for delegation (single sign-on)– Organisation of users in VOs
• Currently deployed main services– GRAM (resource allocation):
attrib/value pairs over HTTP
– GridFTP (bulk file transfer):FTP with GSI and high-throughput extras (striping)
– MDS (monitoring and discovery service):LDAP + common resource description schema
• Next generation: Grid Services (OGSA)
Grid Security Infrastructure
• Requirements:– “Secure” – User identification– Accountability– Site autonomy– Usage control
– Single sign-on– Dynamic VOs any time and any place– Mobility (“easyEverything”, airport kiosk, handheld)– Multiple roles for each user– Easy!
Authentication – PKI
• Asserting, binding identities
• Trust issues on a global scale
– EDG: CA Coord. Group• 16 national certification authorities
+ CrossGrid CAs• policies & procedures mutual trust• users identified by CA’s certificates
– Part of world-wide GridPMA• Establishing minimum requirements• Includes several US and AP CAs
• Scaling still a challenge
EDG CA’s
CERN
CESNET
CNRS (3)
GermanGrid
Grid-Ireland
INFN
NIKHEF
NorduGrid
LIP
Russian DataGrid
DATAGRID-ES
GridPP
US–DOE Root CA
US-DOE Sub CA
CrossGrid (*)
http://marianne.in2p3.fr/datagrid/ca and http://www.gridpma.org/
Getting People TogetherVirtual Organisations
• The user community `out there’ is large & highly dynamic• Applying at each individual resource does not scale
• Users get together to form Virtual Organisations:– Temporary alliance of stakeholders
(users and/or resources)– Various groups and roles– Managed by (legal) contracts– Setup and dissolved at will
*currently not yet that fast
• Authentication, Authorization, Accounting (AAA)
Authorization (today)
• Virtual Organisation “directories”– Members are listed in a directory– Managed by VO responsible
– Sites extract access lists from directories– Only for VOs they have “contract” with– Still need OS-local accounts
– May also use automated tools (sysadm level)• poolAccounts• slashGrid
http://cern.ch/hep-project-grid-scg/
Grid Security in Action
• Key elements in Grid Security Infrastructure (GSI)– Proxy– Trusted certificate store– Delegation: full or restricted rights
• Access services directly
• Establish trust between processes
Site A(Kerberos)
Site B (Unix)
Site C(Kerberos)
Computer
User
Single sign-on via “grid-id”& generation of proxy cred.
Or: retrieval of proxy cred.from online repository
User ProxyProxy
credential
Computer
Storagesystem
Communication*
GSI-enabledFTP server
AuthorizeMap to local idAccess file
Remote fileaccess request*
GSI-enabledGRAM server
GSI-enabledGRAM server
Remote processcreation requests*
* With mutual authentication
Process
Kerberosticket
Restrictedproxy
Process
Restrictedproxy
Local id Local id
AuthorizeMap to local idCreate processGenerate credentials
Ditto
GSI in Action“Create Processes at A and B
that Communicate & Access Files at C”
Large-scale production Grids
• Until recently usually “smallish”– O(10) sites, O(20) users– Only one community (VO)
Running Production Grids• EU DataGrid (EDG)
– Stress testing: up to 2000 jobs at any time– Focus on stability (>99% of jobs complete correctly)
• VL-E • NASA IPG• LCG, PPDG/iVDGL
EU DataGrid
• Middleware research project (2001-2003)• Driving applications:
• HE Physics• Earth Observation• Biomedicine
• Operational testbed• 25 sites, 50 CEs• 8 VOs• ~ 350 users, growing with ~50/month!
http://www.eu-datagrid.org/
EU DataGrid Test Bed 1
• DataGrid TB1:– 14 countries– 21 major sites– CrossGrid: 40 more sites
• Submitting Jobs:– Login only once,
run everywhere– Cross administrative
boundaries in asecure and trusted way
– Mutual authorization
http://marianne.in2p3.fr/
EDG: 3 Tier ArchitectureEDG: 3 Tier Architecture
Client‘User Interface’
Execution Resources‘ComputeElement’
Data Server‘StorageElement’
Request
ResultData
Request
Database server
ESA – KNMIProcessing of raw GOME
data to ozone profilesWith Opera and Noprego
IPSL
Validate GOME ozone profilesWith Ground Based measurements
‘Raw’ satellite data from the GOME instrument
Visualization
LIDAR data
DataGrid
Level 1
Level 2
GOME processing cycle
Information Services (IS)
Cluster information Storage capacity Network connections
HARDWARE – fabric and storageToday: info-providers publish to
IS hierarchical directory
Next week: R-GMA producer-consumer framework based on
RDBMS
File replica locations
DATA – files and collectionsToday: Replica Catalogue (RC)
In few month: Replica Location Service
RunTime Environment tags Service entries (SE, CE, RC)
SOFTWARE – programs & services
Today: in IS
Grid job submission
• Basic protocol: GRAM– Job submission at individual CE– Status inqueries– Credential delegation– File staging– Job manager (baby-sitter)
• Collective services (Workload Mngt System)– Resource broker– Job submission service– Logging and Bookkeeping
• The EDG WMS tries to optimize the usage of resources• Will re-submit on resource failure
•Information to be specified–Job characteristics–Requirements and Preferences of the computing system
–Software dependencies–Job Data requirements –Specified using a Job Description Language (JDL)
Job Preparation
Example JDL File
Executable = “gridTest”;
StdError = “stderr.log”;
StdOutput = “stdout.log”;
InputSandbox = {“home/joda/test/gridTest”};
OutputSandbox = {“stderr.log”, “stdout.log”};
InputData = “LF:testbed0-00019”;
ReplicaCatalog = “ldap://sunlab2g.cnaf.infn.it:2010/ \ lc=test, rc=WP2 INFN Test, dc=infn,
dc=it”;
DataAccessProtocol = “gridftp”;
Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && \
other.FreeCpus >=4;
Rank = “other.MaxCpuTime”;
This JDL is input to dg-job-submit
Job Submission Scenario
UIJDL
Logging &Bookkeeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement CE)Element CE)
Information Service (IS)
ReplicaCatalogue(RC)
Example
UIJDL
Logging &Bookkeeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job SubmitEvent
Input Sandbox
Job Status
submitted
Example
UIJDL
Logging &Bookkeeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job Status
submitted
waiting
Example
UIJDL
Logging &Bookkeeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job Status
submitted
waiting
ready
Example
UIJDL
Logging &Bookkeeping(LB)
ResourceBroker (RB)
Job SubmissionService(JSS)
StorageElement (SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job Status
submitted
waiting
ready
BrokerInfo
scheduled
Example
UIJDL
Logging &Bookkeeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job Status
submitted
waiting
ready
scheduledInput Sandbox
running
Example
UIJDL
Logging &Bookkeeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job Status
submitted
waiting
ready
scheduled
Job Status
running
Example
UIJDL
Logging &Bookkeeping
ResourceBroker
Job SubmissionService
StorageElement
ComputeComputeElementElement
Information Service
ReplicaCatalogue
submitted
waiting
ready
scheduled
running
Job Status
done
Job Status
Example
UIJDL
Logging &Bookkeeping
ResourceBroker
Job SubmissionService
StorageElement
ComputeComputeElementElement
Information Service
ReplicaCatalogue
submitted
waiting
ready
scheduled
running
done
Job Status
Job Status
outputready
Output Sandbox
Example
UIJDL
Logging &Bookkeeping(LB)
ResourceBroker (RB)
Job SubmissionService (JS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Output Sandbox
cleared
submitted
waiting
ready
scheduled
running
done
Job Status
outputready
Data Access & Transport
• Requirements– Support single sign-on– Transfer large files quickly– Confidentiality/integrity– Integrated with information systems (RC)
• Extensions to FTP protocol: GridFTP– GSI, DCAU– Server striping, parallel streams
• TCP protocol optimisation
EDG Storage Element
• Transfer methods:– gridFTP– RFIO– G-HTTPS
• Replica Catalogue– Yesterday: LDAP directory using GDMP– Today: Replica Location Service and Giggle
• Backend systems– Disk storage– HPSS via HRM– HPSS with explicit staging
Grid Data Bases ?!
• Database Access and Integration (DAI)-WG– OGSA-DAI integration project– Data Virtualisation Services– Standard Data Source Services
Early Emerging Standards:– Grid Data Service specification (GDS)– Grid Data Service Factory (GDSF)
Largely spin-off from the UK e-Science effort & DataGrid
Grid Access to Databases
• SpitFire (standard data source services)uniform access to persistent storage on the Grid
• Multiple roles support• Compatible with GSI (single sign-on) though CoG• Uses standard stuff: JDBC, SOAP, XML• Supports various back-end data bases
http://hep-proj-spitfire.web.cern.ch/hep-proj-spitfire/
Spitfire security model
Standard access to DBs
•GSI SOAP protocol•Strong authentication
•Supports single-signon•Local role repository
•Connection pool to•Multiple backend DBs
Version 1.0 out,WebServices version in alpha
Bringing Grids to the User
• Core services too complex to present to scientistsdesign (graphical/web) portals
• VLAM-G• GENIUS/
EnginFrame• EDG GUI
• Application-specific interfaces
Grids Around the World
• Many different grid projects• Different goals (and thus architectures)• Breath of applications
– Meta-supercomputing (origin of the Grid)– High-throughput computing (DataGrids)– Collaboratories, data fusion grids– Harnassing idle workstations– Transaction-oriented grids (industry)
• Interoperability requires standardisation!
Standards Requirements
• GGF established in 2001merger of GridForum and Egrid Forum
• Approx. 50 working & research groups
0
200
400
600
800
1000
1200
1999 1999 2000 2000 2000 2001 2001 2001 2002 2002
(G)GF attendance
http://www.ggf.org/
OGSA: current directions
Open Grid Services Architecture … … cleaning up the protocol mess
• Use standard containers (based on web services)
• Based on common standards:– SOAP, WSDL, UDDI– Running over “upgraded” Grid Security Infra (GSI)
• New in OGSA: adding transient “manageable” services:– State of distributed activities– Workflow, multi-media, distributed data analysis
OGSA Roadmap
• Introduced at GGF4 (Toronto, March 2002)• OGSI definition draft went for final call last week
• First implementations – Globus Toolkit v3– Currently in alpha testing– Beta release in July
• Significant effort towards homogeneous interfaces
• Large commitment (world-wide and local)
DutchGrid Platform
Amsterdam
Utrecht
KNMI
Delft
Nijmegen
TELIN
• DutchGrid:– Test bed coordination– PKI security– Support
• Participation byNIKHEF, KNMI, SARA
DAS-2 (ASCI):TUDelft, Leiden, VU, UvA, Utrecht
Telematics Institute
FOM, NWO/NCF
Min. EZ (ICES/KIS)
IBM, KPN, …
Leiden
ASTRONJIVE
www.dutchgrid.nl
Resources
• ASCI DAS-2 (VU, UvA, Leiden, TUDelft, Utrecht)– 200 dual P-III 1GHz CPUs– homogeneous clusters, 5 locations
• NIKHEF DataGrid clusters– 75 dual P-III ~ 1GHz – 1Gb/s IPv4 + 1Gb/s IPv6
• NCF Gridnational computer facilities foundation from NWO
– 66 node dual AMD-K7 Fabric Research Cluster (NIKHEF)– 32 node duals “production quality” cluster (SARA)*– 10Gb/s optical “lambda” test bed– …
• BioASP – various smaller O(1-10 node) clusters
Resources (cont.)
SARA – National HPC Centre• Processing
– SGI 1024 processor MPP
• Mass storage– StorageTek NearLine tape robot– currently: 500 TByte– Integrated as an EDG “Storage Element”
• User expertise centre
SURFnet – networking• 2.5-10 Gb/s international• 10 Gb/s to dedicated centres (DAS-2, ASTRON)