Introduction to the Grid Peter Kacsuk MTA SZTAKI .

Introduction to the Grid

Peter KacsukMTA SZTAKI

www.lpds.sztaki.hu

© Peter Kacsuk

2

Agenda

• From Metacomputers to the Grid• Grid Applications• Job Managers in the Grid - Condor• Grid Middleware – Globus• Grid Application Environments

© Peter Kacsuk

3

Grid Computing in the News

Credit to Fran Berman

4

© Peter Kacsuk

Real World Distributed Applications

• SETI@home– 3.8M users in 226 countries

– 1200 CPU years/day

– 38 TF sustained (Japanese Earth Simulator is 40 TF peak)

– 1.7 ZETAflop over last 3 years (10^21, beyond peta and exa …)

– Highly heterogeneous: >77 different processor types

Credit to Fran Berman

© Peter Kacsuk

5

OGSA

Supercomputing(PVM/MPI)

Network Computing (sockets)

Clustercomputing

OO Computing (CORBA)

Web Computing (scripts)

High-throughputcomputing

High-performancecomputing

Object Web

Condor Globus Web Services

Client/server

Progress in Grid Systems

Clusters

Semantic Grid Grid Systems

© Peter Kacsuk

6

Progress to the Grid

Single processor

2100 2100 2100 2100

2100 2100 2100 2100

Cluster Meta-computer

GFlops

Computers

Super-computer

© Peter Kacsuk

7

Original motivation for metacomputing

• Grand challenge problems run weeks and months even on supercomputers and clusters

• Various supercomputers/clusters must be connected by wide area networks in order to solve grand challenge problems in reasonable time

© Peter Kacsuk

8

Original meaning of metacomputing

Wide area network

Original goal of metacomputing:

• Distributed supercomputing to achieve higher performance than individual supercomputers/clusters can provide

Supercomputing +Metacomputing =

© Peter Kacsuk

Distributed Supercomputing

• Issues:– Resource discovery,

scheduling– Configuration– Multiple comm methods– Message passing (MPI)– Scalability– Fault tolerance

NCSAOrigin

CaltechExemplar

ArgonneSP

MauiSP

SF-Express Distributed Interactive Simulation: Caltech, USC/ISI

© Peter Kacsuk

10

Technologies for metacomputers

Super-computing

WAN technology

Distributed computing

Metacomputers

© Peter Kacsuk

11

What is a Metacomputer?

• A metacomputer is a collection of– computers

– that are heterogeneous in every aspects

– geographically distributed

– connected by a wide-area network

– form the image of a single computer

• Metacomputing means:– network based

– distributed supercomputing

© Peter Kacsuk

12

Further motivations for metacomputing

• Better usage of computing and other resources accessible via wide area network

• Various computers must be connected by wide area networks in order to exploit their spare cycles

• Various special devices must be accessed by wide area networks for collaborative work

© Peter Kacsuk

13

Motivations for grid-computing

• To form a computational grid similar to the information data access on the web.

• Any computers/devices must be connected by wide area networks in order to form a universal source of computing power.

• Grid = generalised metacomputing

© Peter Kacsuk

14

Technologies that led to the Grid

Super-computing

Network technology

Web technology

Grid

© Peter Kacsuk

15

What is a Grid?

• A Grid is a collection of– computers, storage and other devices

– that are heterogeneous in every aspects

– geographically distributed

– connected by a wide-area network

– form the image of a single computer

• Generalised metacomputing means:– network based

– distributed computing

© Peter Kacsuk

16

Application areas of the Grid

• Disributed supercomputing• High throughput computing

– Parameter studies

• Virtual laboratory– Collaborative design

• Data intensive applications– Sky survey, particle physics

• Geographic Information systems

• Teleimmersion• Enterprise architectures

© Peter Kacsuk

Distributed Supercomputing

• Issues:– Resource discovery,

scheduling– Configuration– Multiple comm methods– Message passing (MPI)– Scalability– Fault tolerance

NCSAOrigin

CaltechExemplar

ArgonneSP

MauiSP

SF-Express Distributed Interactive Simulation: Caltech, USC/ISI

© Peter Kacsuk

18

High-Throughput Computing

Nimrod-G: Monash University

CostDeadline

AvailableMachines

• Schedule many independent tasks– Parameter studies

– Data analysis

• Issues:– Resource discovery

– Data Access

– Scheduling

– Reservation

– Security

– Accounting

– Code management

© Peter Kacsuk

19

yourworkstation

personalCondor

Condorjobs

High throughput Computing: Condor

• Goal: Exploit the spare cycles of computers in the Grid

• Realization steps (1): Turn your desktop into a personal Condor machine

Credit to Miron Livny

© Peter Kacsuk

20

yourworkstation

personalCondor

Condorjobs

SZTAKI clusterCondor pool


• Realization steps (2): Create your institute level Condor pool


© Peter Kacsuk

21

yourworkstation

Friendly BMECondor pool

personalCondor

Condorjobs



• Realization steps (3): Connect “friendly” Condor pools.


© Peter Kacsuk

22

Hungarian Grid

PBS LSF

Condor

yourworkstation

Friendly BMECondor pool

personalCondor

Condorjobs


glide-ins

• Realization steps (4): Temporary exploitation of Grid resources


© Peter Kacsuk

23

Numberofworkers

NUG30 - Solved!!!

• Solved in 7 days instead of 10.9 years

• The first 600K seconds …


© Peter Kacsuk

24

The Condor model

TCP/IP

Resource requirement

ClassAdds

Match-maker

Resource requestor

Resource provider

Publish (configuration description)

Your program moves to resource(s)

Security is a serious problem!

© Peter Kacsuk

25

Generic Grid Architecture

CPUsCPUs TertiaryStorage

TertiaryStorage

OnlineStorage

OnlineStorage CommunicationsCommunicationsScientific

Instruments

ScientificInstruments

Resource Management Resource Management

ApplicationEnvironments

ApplicationSupport

Grid CommonServices

Grid Fabric- localresources

Info

rmat

ion

Ser

vice

s

Glo

bal

Sce

duli

ng

Dat

a A

cces

sC

achi

ng

Res

ourc

eC

o-A

lloc

atio

n

Aut

hent

icat

ion

Aut

hori

sati

on

Mon

itor

ing

Fau

ltM

anag

emen

t

Pol

icy

Acc

ount

ing

Appl. Dev. Environments

Analysis &Visualisation

CollaboratoriesProblemSolving

EnvironmentsGrid Portals

MPI CONDOR CORBA JAVA/JINIOLE

DCOMOther...

© Peter Kacsuk

26

Middleware concepts

• Goal of the middleware:– to turn a radically heterogeneous environment

into a virtual homogeneous one

• Three main concepts:– Toolkit (mix-and-match) approach

• Globus

– Object-oriented approach• Legion, Globe

– Commodity Internet-www approach• Web services

© Peter Kacsuk

Globus Layered Architecture

Applications

Grid ServicesGRAM

GSI HBM

Nexus

I/O

Grid Fabric

LSF

Condor MPI

NQEPBS

TCP

NTLinux

UDP

Application Toolkits

DUROC globusrunMPI Nimrod/GCondor-G HPC++

GlobusView Testbed Status

GASS

Solaris DiffServ

GSI-FTPMDS-2

© Peter Kacsuk

28

Globus Approach: Hourglass

GRAMprotocol

Condor, LSF, NQE, PBS,

etc.

Resource brokers,

Resource co-allocators

High-level services

Low-level tools

Internetprotocol

TCP, FTP, HTTP, etc.

Ethernet, ATM, FDDI, etc.

© Peter Kacsuk

29

Globus hierarchical resource management architecture

RSL

Application

BrokersRun DIS with 100K entities

Information service

(MDS-2)

Co-allocators

Simple ground RSL

GRAM GRAMRun SF-express on 80 nodes

Run SF-express on 256 nodes

Ground RSL

80 nodes on Arg SP-2,256 nodes on CIT Exemplar

Argonne Resource Manager

SDSC Resource ManagerLocal resource

managers

© Peter Kacsuk

30

The Globus Model

GRAM API

MDS-2 API

Resource description

Info system(MDS-2)

Resource requestor

Resource provider

Publish

(configuration description)

Your program moves to resource(s)

Security is a serious problem!

© Peter Kacsuk

31

“Standard” MDS Architecture (MDS-2)

• Resources run a standard information service (GRIS) which speaks LDAP and provides information about the resource (no searching).

• GIIS provides a “caching” service much like a web search engine. Resources register with GIIS and GIIS pulls information from them when requested by a client and the cache is expired.

• GIIS provides the collective-level indexing/searching function.

GIIS

Cache contains info from A and B

Resource A

GRISClient 1

Client 2

Client 3

Resource B

GRIS

GIIS requests information fromGRIS services as needed.

Clients 1 and 2 request infodirectly from resources.

Client 3 uses GIIS for searchingcollective information.

© Peter Kacsuk

32

Grid Security Infrastructure (GSI)

PKI(CAs and

Certificates)SSL

Proxies and Delegation

PKI forcredentials

SSL (Secure Socket Layer) forAuthenticationand messageprotection

Proxies and delegation (GSIExtensions) for secure singleSign-on

© Peter Kacsuk

33

Grid application environments

• Integrated environments– Cactus

– P-GRADE (Parallel Grid Run-time and Application Development Environment)

• Application specific environments– NetSolve

• Problem solving environments• Grid portals

© Peter Kacsuk

34

A Collaborative Grid Environment based on Cactus

Remote steering and monitoring

from airport

Origin: NCSA

Remote Viz in St Louis

T3E: Garching

Simulations launched from Cactus PortalGrid enabled

Cactus runs on distributed machines

Remote Viz and steering from Berlin

Viz of data from previous simulations in

Vienna café

DataGrid/DPSSDownsampling

Globus

http

HDF5

IsoSurfaces

Credit to Ed Seidel

© Peter Kacsuk

35

Edit, debugging

Performance-analysis

Execution

P-GRADEP-GRADE: Software Development and : Software Development and ExecutionExecution

Grid

© Peter Kacsuk

38

Nowcast Meteorology Application in P-GRADE

25 x

10 x 25 x 5 x

1st job

2nd job 3rd job 4th job

5th job

© Peter Kacsuk

40

PERL-GRID

• A thin layer for – Grid level job management between

P-GRADE and various local job managers like

• Condor• SGE, etc.

– file staging

• Application in the Hungarian Cluster Grid

© Peter Kacsuk

41

Hungarian Cluster Grid Initiative

• Goal: To connect 99 new clusters of the Hungarian higher education institutions into a Grid

• Each cluster contains 20 PCs and a network server PC.– Day-time: the components of the clusters are used

for education– At night: all the clusters are connected to the

Hungarian Grid by the Hungarian Academic network (2.5 Gbit/sec)

– Total Grid capacity by the end of 2003: 2079 PCs

• Current status:– About 400 PCs are already connected at 8

universities– Condor-based Grid system– VPN (Virtual Private Network)

• Open Grid: other clusters can join at any time

© Peter Kacsuk

42

Structure of the Hungarian Cluster Grid

2.5 Gb/s Internet

2003: 99*21 PC Linux clusters, total 2079 PCs

Condor => TotalGrid

Condor => TotalGrid

Condor => TotalGrid

© Peter Kacsuk

43

• Examples:– Problem solving env. for

computational chemistry

– Application web portals

• Issues:– Remote job submission,

monitoring, and control

– Resource discovery

– Distributed data archive

– Security

– Accounting

Problem Solving Environments

ECCE’: Pacific Northwest National Laboratory

© Peter Kacsuk

44

Grid Portals

• GridPort (https://gridport.npaci.edu)• Grid Resource Broker (GRB)

(http://sara.unile.it/grb)

• Grid Portal Development Kit (GPDK)

(http://www.doesciencegrid.org/Grid)

• Genius (http://www.infn.it/grid)

© Peter Kacsuk

47

Summary

• Grid is a new technology which integrates:– Supercomputing

– Wide-area network technology

– WWW technology

• The computational Grid will lead to a new infrastructure similar to the electrical grid

• This infrastructure will have a tremendous influence on the Information Society

Introduction to the Grid Peter Kacsuk MTA SZTAKI .

Documents

Transcript of Introduction to the Grid Peter Kacsuk MTA SZTAKI .