Introduction to the Grid Peter Kacsuk MTA SZTAKI .
-
date post
18-Dec-2015 -
Category
Documents
-
view
221 -
download
0
Transcript of Introduction to the Grid Peter Kacsuk MTA SZTAKI .
Introduction to the Grid
Peter KacsukMTA SZTAKI
www.lpds.sztaki.hu
© Peter Kacsuk
2
Agenda
• From Metacomputers to the Grid• Grid Applications• Job Managers in the Grid - Condor• Grid Middleware – Globus• Grid Application Environments
© Peter Kacsuk
3
Grid Computing in the News
Credit to Fran Berman
4
© Peter Kacsuk
Real World Distributed Applications
• SETI@home– 3.8M users in 226 countries
– 1200 CPU years/day
– 38 TF sustained (Japanese Earth Simulator is 40 TF peak)
– 1.7 ZETAflop over last 3 years (10^21, beyond peta and exa …)
– Highly heterogeneous: >77 different processor types
Credit to Fran Berman
© Peter Kacsuk
5
OGSA
Supercomputing(PVM/MPI)
Network Computing (sockets)
Clustercomputing
OO Computing (CORBA)
Web Computing (scripts)
High-throughputcomputing
High-performancecomputing
Object Web
Condor Globus Web Services
Client/server
Progress in Grid Systems
Clusters
Semantic Grid Grid Systems
© Peter Kacsuk
6
Progress to the Grid
Single processor
2100 2100 2100 2100
2100 2100 2100 2100
Cluster Meta-computer
GFlops
Computers
Super-computer
© Peter Kacsuk
7
Original motivation for metacomputing
• Grand challenge problems run weeks and months even on supercomputers and clusters
• Various supercomputers/clusters must be connected by wide area networks in order to solve grand challenge problems in reasonable time
© Peter Kacsuk
8
Original meaning of metacomputing
Wide area network
Original goal of metacomputing:
• Distributed supercomputing to achieve higher performance than individual supercomputers/clusters can provide
Supercomputing +Metacomputing =
© Peter Kacsuk
Distributed Supercomputing
• Issues:– Resource discovery,
scheduling– Configuration– Multiple comm methods– Message passing (MPI)– Scalability– Fault tolerance
NCSAOrigin
CaltechExemplar
ArgonneSP
MauiSP
SF-Express Distributed Interactive Simulation: Caltech, USC/ISI
© Peter Kacsuk
10
Technologies for metacomputers
Super-computing
WAN technology
Distributed computing
Metacomputers
© Peter Kacsuk
11
What is a Metacomputer?
• A metacomputer is a collection of– computers
– that are heterogeneous in every aspects
– geographically distributed
– connected by a wide-area network
– form the image of a single computer
• Metacomputing means:– network based
– distributed supercomputing
© Peter Kacsuk
12
Further motivations for metacomputing
• Better usage of computing and other resources accessible via wide area network
• Various computers must be connected by wide area networks in order to exploit their spare cycles
• Various special devices must be accessed by wide area networks for collaborative work
© Peter Kacsuk
13
Motivations for grid-computing
• To form a computational grid similar to the information data access on the web.
• Any computers/devices must be connected by wide area networks in order to form a universal source of computing power.
• Grid = generalised metacomputing
© Peter Kacsuk
14
Technologies that led to the Grid
Super-computing
Network technology
Web technology
Grid
© Peter Kacsuk
15
What is a Grid?
• A Grid is a collection of– computers, storage and other devices
– that are heterogeneous in every aspects
– geographically distributed
– connected by a wide-area network
– form the image of a single computer
• Generalised metacomputing means:– network based
– distributed computing
© Peter Kacsuk
16
Application areas of the Grid
• Disributed supercomputing• High throughput computing
– Parameter studies
• Virtual laboratory– Collaborative design
• Data intensive applications– Sky survey, particle physics
• Geographic Information systems
• Teleimmersion• Enterprise architectures
© Peter Kacsuk
Distributed Supercomputing
• Issues:– Resource discovery,
scheduling– Configuration– Multiple comm methods– Message passing (MPI)– Scalability– Fault tolerance
NCSAOrigin
CaltechExemplar
ArgonneSP
MauiSP
SF-Express Distributed Interactive Simulation: Caltech, USC/ISI
© Peter Kacsuk
18
High-Throughput Computing
Nimrod-G: Monash University
CostDeadline
AvailableMachines
• Schedule many independent tasks– Parameter studies
– Data analysis
• Issues:– Resource discovery
– Data Access
– Scheduling
– Reservation
– Security
– Accounting
– Code management
© Peter Kacsuk
19
yourworkstation
personalCondor
Condorjobs
High throughput Computing: Condor
• Goal: Exploit the spare cycles of computers in the Grid
• Realization steps (1): Turn your desktop into a personal Condor machine
Credit to Miron Livny
© Peter Kacsuk
20
yourworkstation
personalCondor
Condorjobs
SZTAKI clusterCondor pool
High throughput Computing: Condor
• Realization steps (2): Create your institute level Condor pool
Credit to Miron Livny
© Peter Kacsuk
21
yourworkstation
Friendly BMECondor pool
personalCondor
Condorjobs
SZTAKI clusterCondor pool
High throughput Computing: Condor
• Realization steps (3): Connect “friendly” Condor pools.
Credit to Miron Livny
© Peter Kacsuk
22
Hungarian Grid
PBS LSF
Condor
yourworkstation
Friendly BMECondor pool
personalCondor
Condorjobs
SZTAKI clusterCondor pool
glide-ins
• Realization steps (4): Temporary exploitation of Grid resources
Credit to Miron Livny
© Peter Kacsuk
23
Numberofworkers
NUG30 - Solved!!!
• Solved in 7 days instead of 10.9 years
• The first 600K seconds …
Credit to Miron Livny
© Peter Kacsuk
24
The Condor model
TCP/IP
Resource requirement
ClassAdds
Match-maker
Resource requestor
Resource provider
Publish (configuration description)
Your program moves to resource(s)
Security is a serious problem!
© Peter Kacsuk
25
Generic Grid Architecture
CPUsCPUs TertiaryStorage
TertiaryStorage
OnlineStorage
OnlineStorage CommunicationsCommunicationsScientific
Instruments
ScientificInstruments
Resource Management Resource Management
ApplicationEnvironments
ApplicationSupport
Grid CommonServices
Grid Fabric- localresources
Info
rmat
ion
Ser
vice
s
Glo
bal
Sce
duli
ng
Dat
a A
cces
sC
achi
ng
Res
ourc
eC
o-A
lloc
atio
n
Aut
hent
icat
ion
Aut
hori
sati
on
Mon
itor
ing
Fau
ltM
anag
emen
t
Pol
icy
Acc
ount
ing
Appl. Dev. Environments
Analysis &Visualisation
CollaboratoriesProblemSolving
EnvironmentsGrid Portals
MPI CONDOR CORBA JAVA/JINIOLE
DCOMOther...
© Peter Kacsuk
26
Middleware concepts
• Goal of the middleware:– to turn a radically heterogeneous environment
into a virtual homogeneous one
• Three main concepts:– Toolkit (mix-and-match) approach
• Globus
– Object-oriented approach• Legion, Globe
– Commodity Internet-www approach• Web services
© Peter Kacsuk
Globus Layered Architecture
Applications
Grid ServicesGRAM
GSI HBM
Nexus
I/O
Grid Fabric
LSF
Condor MPI
NQEPBS
TCP
NTLinux
UDP
Application Toolkits
DUROC globusrunMPI Nimrod/GCondor-G HPC++
GlobusView Testbed Status
GASS
Solaris DiffServ
GSI-FTPMDS-2
© Peter Kacsuk
28
Globus Approach: Hourglass
GRAMprotocol
Condor, LSF, NQE, PBS,
etc.
Resource brokers,
Resource co-allocators
High-level services
Low-level tools
Internetprotocol
TCP, FTP, HTTP, etc.
Ethernet, ATM, FDDI, etc.
© Peter Kacsuk
29
Globus hierarchical resource management architecture
RSL
Application
BrokersRun DIS with 100K entities
Information service
(MDS-2)
Co-allocators
Simple ground RSL
GRAM GRAMRun SF-express on 80 nodes
Run SF-express on 256 nodes
Ground RSL
80 nodes on Arg SP-2,256 nodes on CIT Exemplar
Argonne Resource Manager
SDSC Resource ManagerLocal resource
managers
© Peter Kacsuk
30
The Globus Model
GRAM API
MDS-2 API
Resource description
Info system(MDS-2)
Resource requestor
Resource provider
Publish
(configuration description)
Your program moves to resource(s)
Security is a serious problem!
© Peter Kacsuk
31
“Standard” MDS Architecture (MDS-2)
• Resources run a standard information service (GRIS) which speaks LDAP and provides information about the resource (no searching).
• GIIS provides a “caching” service much like a web search engine. Resources register with GIIS and GIIS pulls information from them when requested by a client and the cache is expired.
• GIIS provides the collective-level indexing/searching function.
GIIS
Cache contains info from A and B
Resource A
GRISClient 1
Client 2
Client 3
Resource B
GRIS
GIIS requests information fromGRIS services as needed.
Clients 1 and 2 request infodirectly from resources.
Client 3 uses GIIS for searchingcollective information.
© Peter Kacsuk
32
Grid Security Infrastructure (GSI)
PKI(CAs and
Certificates)SSL
Proxies and Delegation
PKI forcredentials
SSL (Secure Socket Layer) forAuthenticationand messageprotection
Proxies and delegation (GSIExtensions) for secure singleSign-on
© Peter Kacsuk
33
Grid application environments
• Integrated environments– Cactus
– P-GRADE (Parallel Grid Run-time and Application Development Environment)
• Application specific environments– NetSolve
• Problem solving environments• Grid portals
© Peter Kacsuk
34
A Collaborative Grid Environment based on Cactus
Remote steering and monitoring
from airport
Origin: NCSA
Remote Viz in St Louis
T3E: Garching
Simulations launched from Cactus PortalGrid enabled
Cactus runs on distributed machines
Remote Viz and steering from Berlin
Viz of data from previous simulations in
Vienna café
DataGrid/DPSSDownsampling
Globus
http
HDF5
IsoSurfaces
Credit to Ed Seidel
© Peter Kacsuk
35
Edit, debugging
Performance-analysis
Execution
P-GRADEP-GRADE: Software Development and : Software Development and ExecutionExecution
Grid
© Peter Kacsuk
36
Nowcast Meteorology Application in P-GRADE
25 x
10 x 25 x 5 x
© Peter Kacsuk
37
Performance visualisation in
P-GRADE
© Peter Kacsuk
38
Nowcast Meteorology Application in P-GRADE
25 x
10 x 25 x 5 x
1st job
2nd job 3rd job 4th job
5th job
© Peter Kacsuk
39
Layers of TotalGrid
Internet Ethernet
PVM or MPI
Condor or SGE
PERL-GRID
P-GRADE
© Peter Kacsuk
40
PERL-GRID
• A thin layer for – Grid level job management between
P-GRADE and various local job managers like
• Condor• SGE, etc.
– file staging
• Application in the Hungarian Cluster Grid
© Peter Kacsuk
41
Hungarian Cluster Grid Initiative
• Goal: To connect 99 new clusters of the Hungarian higher education institutions into a Grid
• Each cluster contains 20 PCs and a network server PC.– Day-time: the components of the clusters are used
for education– At night: all the clusters are connected to the
Hungarian Grid by the Hungarian Academic network (2.5 Gbit/sec)
– Total Grid capacity by the end of 2003: 2079 PCs
• Current status:– About 400 PCs are already connected at 8
universities– Condor-based Grid system– VPN (Virtual Private Network)
• Open Grid: other clusters can join at any time
© Peter Kacsuk
42
Structure of the Hungarian Cluster Grid
2.5 Gb/s Internet
2003: 99*21 PC Linux clusters, total 2079 PCs
Condor => TotalGrid
Condor => TotalGrid
Condor => TotalGrid
© Peter Kacsuk
43
• Examples:– Problem solving env. for
computational chemistry
– Application web portals
• Issues:– Remote job submission,
monitoring, and control
– Resource discovery
– Distributed data archive
– Security
– Accounting
Problem Solving Environments
ECCE’: Pacific Northwest National Laboratory
© Peter Kacsuk
44
Grid Portals
• GridPort (https://gridport.npaci.edu)• Grid Resource Broker (GRB)
(http://sara.unile.it/grb)
• Grid Portal Development Kit (GPDK)
(http://www.doesciencegrid.org/Grid)
• Genius (http://www.infn.it/grid)
© Peter Kacsuk
45
GPDK
© Peter Kacsuk
46
Genius
© Peter Kacsuk
47
Summary
• Grid is a new technology which integrates:– Supercomputing
– Wide-area network technology
– WWW technology
• The computational Grid will lead to a new infrastructure similar to the electrical grid
• This infrastructure will have a tremendous influence on the Information Society