“Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling,...

21
“Internet-Wide Global Supercomputing” Jonathan Giddy CRC for Enterprise Distributed Systems Technology (DSTC) David Abramson and Rajkumar Buyya School of Computer Science and Software Engineering Monash University, Melbourne, Australia [email protected], {davida, rajkumar)@csse.monash.edu.au http://www.csse.monash.edu.au/~rajkumar/ecogrid/ Thanks to: Jack Dongarra AUUG’2000 Canberra Agenda Computing Platforms and How Grid is Different ? Towards Global (Grid) Computing Grid Resource Management and Scheduling Challenges Grid Technologies Grid Substrates and Interoperability Grid Middleware Resource Brokers (Nimrod/G) and High Level Tools Conclusion

Transcript of “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling,...

Page 1: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

“Internet-Wide Global Supercomputing”

Jonathan Giddy CRC for Enterprise Distributed Systems Technology (DSTC)

David Abramson and Rajkumar BuyyaSchool of Computer Science and Software Engineering

Monash University, Melbourne, Australia

[email protected], {davida, rajkumar)@csse.monash.edu.au

http://www.csse.monash.edu.au/~rajkumar/ecogrid/Thanks to: Jack Dongarra

AUUG’2000Canberra

Agenda• Computing Platforms and How Grid is Different ?• Towards Global (Grid) Computing• Grid Resource Management and Scheduling

Challenges• Grid Technologies

– Grid Substrates and Interoperability– Grid Middleware– Resource Brokers (Nimrod/G) and High Level

Tools• Conclusion

Page 2: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Computing Power (HPC) Drivers

Solving grand challenge applications usingcomputer modeling, simulation and analysis

Life Sciences

CAD/CAM

Aerospace

Military ApplicationsDigital Biology Military ApplicationsMilitary Applications

E-commerce/anything

How to Run App. Faster ?

• There are 3 ways to improve performance:– 1. Work Harder– 2. Work Smarter– 3. Get Help

• Computer Analogy

– 1. Use faster hardware: e.g. reduce the timeper instruction (clock cycle).

– 2. Optimized algorithms and techniques– 3. Multiple computers to solve problem:

That is, increase no. of instructionsexecuted per clock cycle.

Page 3: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

2100

2100 2100 2100 2100

2100 2100 2100 2100

Desktop(Single Processor?)

SMPs orSuperCom

puters

LocalCluster

GlobalCluster/Grid

PERFORMANCE

Computing Platforms EvolutionBreaking Administrative Barriers

Inter PlanetCluster/Grid ??

IndividualGroupDepartmentCampusStateNationalGlobeInter PlanetUniverse

Administrative Barriers

EnterpriseCluster/Grid

?

Towards Grid Computing….

For illustration, placed resources arbitrarily on the GUSTO test-bed!!

Page 4: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

What is Grid ?

• An infrastructure that couples– Computers (PCs, workstations, clusters, traditional

supercomputers, and even laptops, notebooks, mobilecomputers, PDA, and so on) …

– Software ? (e.g., ASPs renting expensive specialpurpose applications on demand)

– Databases (e.g., transparent access to human genomedatabase)

– Special Instruments (e.g., radio telescope--SETI@HomeSearching for Life in galaxy, Austrophysics@Swinburnefor pulsars)

– People (may be even animals who knows ?)

• across the local/wide-area networks (enterprise,organisations, or Internet) and presents them asan unified integrated (single) resource.

Conceptual view of the Grid

Leading to Portal (Super)Computing

http://www.sun.com/hpc/

Page 5: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Grid Application-Drivers

• Old and New applications getting enabled due tocoupling of computers, databases, instruments,people, etc:

– (distributed) Supercomputing– Collaborative engineering– high-throughput computing

– large scale simulation & parameter studies– Remote software access / Renting Software– Data-intensive computing– On-demand computing

The Grid Vision: To offer

“Dependable, consistent, pervasiveaccess to

[high-end] resources”• Dependable: Can provide

performance and functionalityguarantees

• Consistent: Uniform interfaces to awide variety of resources

• Pervasive: Ability to “plug in” fromanywhere

Source: www.globus.org

Page 6: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Sources of Complexity in GridResource Management

• No single adminstrative control• No single policy

– each resource owners have their own policiesor scheduling mechanisms.

– Users must honor them (particularly externalusers of the grid).

• Heterogenity of resources (static and dynamic)• Unreliable - resource may come or disappear (die)• No uniform cost model (it cannot be)

– varies from one user to another and time totime.

• No Single access mechanism

'RPDLQ��

'RPDLQ��

Grid Resource Management:Challenging Issues

Ack.: globus..

�������������� ����

�����������������

�������������������

�����������������

����������������������

������������ ��������

��!�������������

���������"���

�#����������������

����������������

���������������$����

� �������������������

���������������

Page 7: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

1/99 5

Goals of ApplicationGoals of ApplicationDevelopment SupportDevelopment Support

◆ ������������� ��½ EH�HDV\�WR�GHYHORS

½ EH�SRUWDEOH

½ DFKLHYH�KLJK�SHUIRUPDQFHª DV�FORVH�WR�ZKDW�LV�SRVVLEOH�E\�KDQG

◆ ��������������������½ VKRXOG�EH�DEOH�WR�FRQFHQWUDWH�RQ�SUREOHP�DQDO\VLV�DQG

GHFRPSRVLWLRQ

◆ �����½ VKRXOG�KDQGOH�GHWDLOV�RI�PDSSLQJ�DEVWUDFW

GHFRPSRVLWLRQ�RQWR�FRPSXWLQJ�FRQILJXUDWLRQ

◆ ������������������½ VKRXOG�ZRUN�WRJHWKHU�WR�GHEXJ�DQG�WXQH�WKH�SURJUDP

http://www.netlib.org/utk/people/JackDongarra/ http://www.netlib.org/utk/people/JackDongarra/

Page 8: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Grid Components

GridFabricNetworked Resources across Organisations

Computers Clusters Data Sources Scientific InstrumentsStorage Systems

Local Resource Managers

Operating Systems Queuing Systems TCP/IP & UDP

Libraries & App Kernels …

Distributed Resources Coupling Services

Comm. Sign on & Security Information … QoSProcess Data Access

Development Environments and Tools

Languages Libraries Debuggers … Web toolsResource BrokersMonitoring

Applications and Portals

Prob. Solving Env.Scientific …CollaborationEngineering Web enabled Apps

GridApps.

GridMiddleware

GridTools

Many GRID Projects and Initiatives

• PUBLIC FORUMS– Computing Portals– Grid Forum– European Grid Forum– IEEE TFCC!– GRID’2000 and more.

• Australia– Nimrod/G– EcoGrid and GRACE– DISCWorld

• Europe– UNICORE– MOL– METODIS– Globe– Poznan Metacomputing– CERN Data Grid– MetaMPI– DAS– JaWS– and many more...

• Public Grid Initiatives– Distributed.net– SETI@Home– Compute Power Grid

• USA– Globus– Legion– JAVELIN– AppLes– NASA IPG– Condor– Harness– NetSolve– NCSA Workbench– WebFlow– EveryWhere– and many more...

• Japan– Ninf– Bricks– and many more...

http://www.gridcomputing.com/

Page 9: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Many GRID Testbeds...

GUSTO

Distributed ASCI SupercomputerNASA IPG

1/99 14

Globus ProjectGlobus Project: Argonne National Lab &: Argonne National Lab &Information Sciences Inst. (Foster & Information Sciences Inst. (Foster & KesselmanKesselman))

◆ ������������������������������������� ����� ���������������� �������������½ ���� ����������������� �����������

◆ ���� ��½ �������������������������������������½ ��������������!��� �����"��#���������������������������������$�����������

½ ���� ������������������������������#!%��&

½ �'������������������'�������

http://www.globus.org/

Page 10: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

1/99

Globus ApproachGlobus Approach

◆ (�� ������������ ���� �½ )������������������������������������ �� ��

½ %��������� ������������������������������ ���

◆ �������������½ *�����������������������+½ ������������������½ � �������������������

'LYHUVH�JOREDO�VHUYLFHV

&RUH�*OREXVVHUYLFHV

/RFDO�26

$�S�S�O�L�F�D�W�L�R�Q�V�

http://www.globus.org/1/99

Globus Toolkit: Core ServicesGlobus Toolkit: Core Services

◆ ����� ����,!��� �-�� ���������.�/�����0½ /RZ�OHYHO�VFKHGXOHU�$3,

◆ 1���������,/������� ���������������������0½ 8QLIRUP�DFFHVV�WR�VWUXFWXUH�VWDWH�LQIRUPDWLRQ

◆ ���� �������,2�' 0½ 0XOWLPHWKRG�FRPPXQLFDWLRQ���4R6�PDQDJHPHQW

◆ ��� �����,!��� ���� �����1����� �� ��0½ 6LQJOH�VLJQ�RQ��NH\�PDQDJHPHQW

◆ 3������������ �,3���������������0◆ -����������������,!�����������������������������0

◆ -������������-�� �������������,!�-�0http://www.globus.org/

Page 11: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Grid Components and our Focus:Resource Brokers and Computational Economy

����������

��� �� ����

0'6

*5$0*OREXV6HFXULW\,QWHUIDFH

+HDUWEHDW0RQLWRU1H[XV

*ORSHUI

������ ����

/6)

&RQGRU 03,

14((DV\

7&3

6RODULV,UL[$,;

8'3

%���$�����������������&����

'852& JOREXVUXQ03, 1LPURG�*03,�,2 &&��

*OREXV9LHZ 7HVWEHG�6WDWXV

*$66

Source: Globus

*5$&(

*$5$

GridFabric

GridApps.

GridMiddleware

GridTools

1/99 1

NASA’s Information Power GridNASA’s Information Power Grid

◆ ��������������������������������������������������� ���������������������������������������� �� �������+����������� ��������.

◆ ���������1)!���4 ��������������������5���2����������������������������������������������������������&����������������.

◆ !����+�����������#½ $SSOLFDWLRQ�FDSDELOLWLHV

½ 'LVWULEXWHG�UHVRXUFH�DFFHVV

◆ !����� ���������#½ 6FDODELOLW\

½ 8VDELOLW\

http://nas.nasa.gov/~wej/home/IPG

Page 12: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

1/99 12

http://nas.nasa.gov/~wej/home/IPG

Nimrod/G Resource Broker

Nimrod/G Approach to ResourceManagement and Scheduling

Page 13: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

• A global scheduler for managing and steeringtask farming (parametric simulation)applications on computational grid based ondeadline and computational economy.

• Key Features– A single window to manage and control

whole experiment– Resource Discovery– Trade for Resources– Scheduling– Steering & data management

• Leverages Globus Services• Other parallel application models can be

supported easily (MPI & DUROC co-allocation)

What is Nimrod/G ? A Nimrod/G User Console

CostDeadline

AvailableMachines

Page 14: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Nimrod/G Architecture

Middleware Services

Nimrod/G Client Nimrod/G ClientNimrod/G Client

Grid Information Services

Schedule Advisor

Trading Manager

Nimrod Engine

GUSTO Test Bed

PersistentStore

Grid Explorer

GE GISTM TS

RM & TS

RM & TS

RM & TS

Dispatcher

RM: Local Resource Manager, TS: Trade Server

Nimrod/G Interactions

MDSserver

Resourcelocation

QueuingSystem

GRAMserver

Resourceallocation

(local)

Additional services used implicitly:• GSI (authentication & authorization)• Nexus (communication)

Userprocess

File accessGASSserver

Gatekeeper node

JobWrapper

Computational node

Dispatcher

Root node

Scheduler

Prmtc..Engine

Trade Server

Page 15: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Resource Location/Discovery

• Need to locate suitable machines for anexperiment

– Speed– Number of processors– Cost– Availability– User account

• Available resources will vary across experiment• Supported through Directory Server (Globus

MDS)

Computational Economy

• Resource selection on based real money andmarket based

• A large number of sellers and buyers(resources may be dedicated/shared)

• Negotiation: tenders/bids and select thoseoffers meet the requirement

• Trading and Advance Resource Reservation• Schedule computations on those resources that

meet all requirements

Page 16: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

1 3

2 1User 5

Mach

ine 1

User 1

Mach

ine 5

Cost Model

• non-uniform costing– time to time– one user to another– usage duration

• encourages use of localresources first

• user can access remoteresources, but pays apenalty in higher cost.

Nimrod/G Scheduling Algorithm

Find a set of machines (MDS search)Distribute jobs from root to machinesEstablish job consumption rate for each machineFor each machine

Can we meet deadline?If not, then return some jobs to rootIf yes, distribute more jobs to resource

If cannot meet deadline with current resourceFind additional resources

Page 17: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Graph 2 - GUSTO Usage for Ionization Chamber Study

0

10

20

30

40

50

60

70

80

0 2.5 5 7.5 10 12.5 15 17.5 20

Time

Ave

rag

eN

o. P

roce

sso

rs

20 Hour deadline15 hour deadline10 hour deadline

Resource Usage (for various deadlines)Graph 3 - GUSTO Usage for 20 Hour Deadline

0

2

4

6

8

10

12

14

16

18

20

0 2.5 5 7.5 10 12.5 15 17.5 20

Time

Ave

rag

e N

o P

roce

sso

rs

5 CUs

10 CUs

15 CUs

20 CUs

50 CUs

5 Cost Units

10 Cost Units

Page 18: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Graph 4 - GUSTO Usage for 15 Hour Deadline

0

2

4

6

8

10

12

14

16

18

20

0 2.5 5 7.5 10 12.5 15 17.5 20

Time

Ave

rag

e N

o P

roce

ssor

s

5 CUs

10 CUs

15 CUs

20 CUs

50 CUs

5 Cost Units

50 Cost Units

15 Cost Units

10 Cost Units

Graph 5 - GUSTO Usage for 10 Hour Deadline

0

5

10

15

20

25

30

35

0 2.5 5 7.5 10 12.5 15 17.5 20

Time

No

Pro

ces

ses 5 CUs

10 CUs

15 CUs

20 CUs

50 CUs

10 Cost Units

50 Cost Units

20 Cost Units

5 Cost Units

15 Cost Units

Page 19: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Scheduling Methods for GlobalResource Allocation

1. Equal Assignment, but no Load Balancing2. Equal Assignment and Load Balanced3. (2) + deadline (no worry about cost)4. (2) + deadline (minimize the cost of computation)5. (4) + budget --> deadline + cost (up front agreement)

– I am willing to pay $$$, can you complete by deadline.

6. (5) + Advance Resource Reservation6. (6) + Grid Economy - dynamic pricing - use

Trading/Tender/Bid process7. (6) + Grid Economy - dynamic pricing - use auction

technique8. Genetic/Fuzzy Logic, etc algorithms

GRACE

Grid Architecture for Computational Economy

• GRACE aims help Nimrod/G overcome the currentlimitations.

• GRACE middleware offer generic interfaces (APIs) thatother developers of grid tools can use along withGlobus services.

Page 20: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Grid Resource Management Architecture(Economic/Market-based)

User

Application

Resource Broker

A Resource Domain

Grid Explorer

Schedule Advisor

Trade Manager

JobControlAgent

Deployment Agent

Trade Server

Resource Allocation

ResourceReservation

R1

Othercomponents

of localresourcemanager

Trading

Grid Information Server

R2 Rn…

Why Computational Economy inResource Management ?

“Observe Grid characteristics and current resourcemanagement policies”

• Grid resources are not owned by user or single organisation.• They have their own administrative policy• Mismatch in resource demand and supply

– overall resource demand may exceed supply.• Traditional System-centric (performance matrix approaches

does not suit in grid environment.– System-Centric --> User Centric

• Like in real life, economic-based approach is one of the bestways to regulate selection and scheduling on the grid as itcaptures user-intent.

• Markets are an effective institution in coordinating theactivities of several entities.

Page 21: “Internet-Wide Global Supercomputing” Agendaraj/talks/auug2000.pdf · computer modeling, simulation and analysis Life Sciences CAD/CAM Aerospace Digital Biology Military Applications

Conclusions

• The Emergence of Internet as a Powerfulconnectivity media is bridging the gap between anumber of technologies leading to what is known as“Everything on IP”.

• Cluster-based systems have become a platform ofchoice for mainstream computing.

• Economic based approach to resource managementis the way to go in the grid environment.

• The user can say “I am willing to pay $…, can youcomplete my job by this time…”

• Both sequential and parallel applications runseamless on desktops, SMPs, Clusters, and the Gridwithout any change.

• Grid: A Next Generation Internet ?

Further Information

• Cluster Computing Infoware:– http://www.buyya.com/cluster/

• Grid Computing Infoware:– http://www.gridcomputing.com

• Millennium Compute Power Grid/Market Project– http://www.ComputePower.com– You are invited to join CPG project!

• Books:– High Performance Cluster Computing, V1, V2, R.Buyya (Ed), Prentice

Hall, 1999.– The GRID, I. Foster and C. Kesselman (Eds), Morgan-Kaufmann, 1999.

• IEEE Task Force on Cluster Computing– http://www.ieeetfcc.org

• GRID Forums– http://www.gridforum.org | http://www.egrid.org

• GRID’2000 Meeting– http://www.gridcomputing.org