Grid Computing: Concepts, Applications, and Technologies

36
Grid Computing: Concepts, Applications, and Technologies Dheeraj Bhardwaj Department of Computer Science and Engineering Indian Institute of Technology, Delhi

description

Grid Computing: Concepts, Applications, and Technologies. Dheeraj Bhardwaj Department of Computer Science and Engineering Indian Institute of Technology, Delhi. Outline. The technology landscape Grid computing The Globus Toolkit Applications and technologies - PowerPoint PPT Presentation

Transcript of Grid Computing: Concepts, Applications, and Technologies

Page 1: Grid Computing: Concepts, Applications, and Technologies

Grid Computing:Concepts, Applications, and

Technologies

Dheeraj BhardwajDepartment of Computer Science and Engineering

Indian Institute of Technology, Delhi

Page 2: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

2

Outline

The technology landscape Grid computing The Globus Toolkit Applications and technologies

– Data-intensive; distributed computing; collaborative; remote access to facilities

Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions

Page 3: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

3

Outline

The technology landscape Grid computing The Globus Toolkit Applications and technologies

– Data-intensive; distributed computing; collaborative; remote access to facilities

Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions

Page 4: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

4Living in an Exponential World

(1) Computing & Sensors

Moore’s Law: transistor count doubles each 18 months

Magnetohydro-dynamics

star formation

Page 5: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

5

Living in an Exponential World:(2) Storage

Storage density doubles every 12 months Dramatic growth in online data (1 petabyte =

1000 terabyte = 1,000,000 gigabyte)– 2000 ~0.5 petabyte

– 2005 ~10 petabytes

– 2010 ~100 petabytes

– 2015 ~1000 petabytes? Transforming entire disciplines in physical and,

increasingly, biological sciences; humanities next?

Page 6: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

6

Data Intensive Physical Sciences

High energy & nuclear physics– Including new experiments at CERN

Gravity wave searches– LIGO, GEO, VIRGO

Time-dependent 3-D systems (simulation, data)– Earth Observation, climate modeling

– Geophysics, earthquake modeling

– Fluids, aerodynamic design

– Pollutant dispersal scenarios Astronomy: Digital sky surveys

Page 7: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

7

Ongoing Astronomical Mega-Surveys

Large number of new surveys– Multi-TB in size, 100M objects or larger

– In databases

– Individual archives planned and under way Multi-wavelength view of the sky

– > 13 wavelength coverage within 5 years Impressive early discoveries

– Finding exotic objects by unusual colors> L,T dwarfs, high redshift quasars

– Finding objects by time variability> Gravitational micro-lensing

MACHO2MASSSDSSDPOSSGSC-IICOBE MAPNVSSFIRSTGALEXROSATOGLE...

MACHO2MASSSDSSDPOSSGSC-IICOBE MAPNVSSFIRSTGALEXROSATOGLE...

Page 8: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

8

Coming Floods of Astronomy Data

The planned Large Synoptic Survey Telescope will produce over 10 petabytes per year by 2008!– All-sky survey every few days, so will have

fine-grain time series for the first time

Page 9: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

9Data Intensive Biology and Medicine

Medical data– X-Ray, mammography data, etc. (many petabytes)

– Digitizing patient records (ditto) X-ray crystallography Molecular genomics and related disciplines

– Human Genome, other genome databases

– Proteomics (protein structure, activities, …)

– Protein interactions, drug delivery Virtual Population Laboratory (proposed)

– Simulate likely spread of disease outbreaks Brain scans (3-D, time dependent)

Page 10: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

10

And comparisons must bemade among many

We need to get to one micron to know location of every cell. We’re just now starting to get to 10 microns – Grids will help get us there and further

A Brainis a Lotof Data!

(Mark Ellisman, UCSD)

Page 11: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

11An Exponential World: (3) Networks

(Or, Coefficients Matter …) Network vs. computer performance

– Computer speed doubles every 18 months

– Network speed doubles every 9 months

– Difference = order of magnitude per 5 years 1986 to 2000

– Computers: x 500

– Networks: x 340,000 2001 to 2010

– Computers: x 60

– Networks: x 4000

Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

Page 12: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

12

Outline

The technology landscape Grid computing The Globus Toolkit Applications and technologies

– Data-intensive; distributed computing; collaborative; remote access to facilities

Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions

Page 13: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

13

Evolution of the Scientific Process

Pre-electronic– Theorize &/or experiment, alone or in small

teams; publish paper Post-electronic

– Construct and mine very large databases of observational or simulation data

– Develop computer simulations & analyses

– Exchange information quasi-instantaneously within large, distributed, multidisciplinary teams

Page 14: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

14

Evolution of Business

Pre-Internet– Central corporate data processing facility

– Business processes not compute-oriented Post-Internet

– Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B)

– Outsourcing becomes feasible => service providers of various sorts

– Business processes increasingly computing- and data-rich

Page 15: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

15

The Grid

“Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”

Page 16: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

16

A ComparisonSERIAL

Fetch/Store

Compute

PARALLEL

Fetch/Store

Compute/ communicate

Cooperative game

GRID

Fetch/Store

Discovery of Resources

Interaction with remote application

Authentication / Authorization

Security

Compute/Communicate

Etc

Page 17: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

17

A ComparisonSERIAL

Fetch/Store

Compute

PARALLEL

Fetch/Store

Compute/ communicate

Cooperative game

GRID

Fetch/Store

Discovery of Resources

Interaction with remote application

Authentication / Authorization

Security

Compute/Communicate

Etc

Page 18: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

18

Distributed Computing vs. GRID

Grid is an evolution of distributed computing– Dynamic– Geographically independent – Built around standards– Internet backbone

Distributed computing is an “older term”– Typically built around proprietary

software and network– Tightly couples systems/organization

Page 19: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

19Web vs. GRID

Web– Uniform naming access to documents

Grid - Uniform, high performance access to computational resources

Colleges/R&D Labs

Software Catalogs Sensor

nets

http://

http://

Page 20: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

20Is the World Wide

Web a Grid ? Seamless naming? Yes Uniform security and Authentication? No Information Service? Yes or No Co-Scheduling? No Accounting & Authorization ? No User Services? No Event Services? No Is the Browser a Global Shell ? No

Page 21: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

21

What does the World Wide Web bring to the Grid ?

Uniform Naming A seamless, scalable information

service A powerful new meta-data language:

XML– XML will be standard language for

describing information in the grid– SOAP – simple object access protocol

> Uses XML for encoding. HTML for protocol

– SOAP may become a standard RPC mechanism for Grid services

> Uses XML for encoding. HTML for protocol

Portal Ideas

Page 22: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

22

The Ultimate Goal

In future I will not know or care where my application will be executed as I will acquire and pay to use these resources as I need them

Page 23: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

23

Why Grids? Large-scale science and engineering are done

through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed.

The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and Engineering.

Page 24: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

24

An Example Virtual Organization: CERN’s Large Hadron Collider

1800 Physicists, 150 Institutes, 32 Countries

100 PB of data by 2010; 50,000 CPUs?

Page 25: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

25Grid Communities & Applications:Data Grids for High Energy Physics

Tier2 Centre ~1 TIPS

Online System

Offline Processor Farm

~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance Regional Centre

Italy Regional Centre

Germany Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight (deprecated)

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 4Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

www.griphyn.org www.ppdg.net www.eu-datagrid.org

Page 26: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

26Intelligent Infrastructure:Distributed Servers and Services

Page 27: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

28

Early 90s– Gigabit testbeds, metacomputing

Mid to late 90s– Early experiments (e.g., I-WAY), academic software

projects (e.g., Globus, Legion), application experiments 2002

– Dozens of application communities & projects– Major infrastructure deployments– Significant technology base (esp. Globus ToolkitTM)– Growing industrial interest – Global Grid Forum: ~500 people, 20+ countries

The Grid:A Brief History

Page 28: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

33

The Grid World: Current Status Dozens of major Grid projects in scientific &

technical computing/research & education– www.mcs.anl.gov/~foster/grid-projects

Considerable consensus on key concepts and technologies– Open source Globus Toolkit™ a de facto standard for

major protocols & services Industrial interest emerging rapidly

– IBM, Platform, Microsoft, Sun, Compaq, … Opportunity: convergence of eScience and

eBusiness requirements & technologies

Page 29: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

34

Outline

The technology landscape Grid computing The Globus Toolkit Applications and technologies

– Data-intensive; distributed computing; collaborative; remote access to facilities

Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions

Page 30: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

35Grid Technologies:

Resource Sharing Mechanisms That …

Address security and policy concerns of resource owners and users

Are flexible enough to deal with many resource types and sharing modalities

Scale to large number of resources, many participants, many program components

Operate efficiently when dealing with large amounts of data & computation

Page 31: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

36

Aspects of the Problem

1) Need for interoperability when different groups want to share resources– Diverse components, policies, mechanisms

– E.g., standard notions of identity, means of communication, resource descriptions

2) Need for shared infrastructure services to avoid repeated development, installation– E.g., one port/service/protocol for remote access to

computing, not one per tool/appln

– E.g., Certificate Authorities: expensive to run A common need for protocols & services

Page 32: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

37

The Hourglass Model

Focus on architecture issues– Propose set of core services

as basic infrastructure– Use to construct high-level,

domain-specific solutions Design principles

– Keep participation cost low– Enable local control– Support for adaptation– “IP hourglass” model

Diverse global services

Coreservices

Local OS

A p p l i c a t i o n s

Page 33: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

38

Layered Grid Architecture(By Analogy to Internet Architecture)

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communication (Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

InternetTransport

Application

Link

Inte

rnet P

roto

col

Arch

itectu

re

Page 34: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

39

Globus Toolkit™

A software toolkit addressing key technical problems in the development of Grid-enabled tools, services, and applications– Offer a modular set of orthogonal services

– Enable incremental development of grid-enabled tools and applications

– Implement standard Grid protocols and APIs

– Available under liberal open source license

– Large community of developers & users

– Commercial support

Page 35: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

40

General Approach

Define Grid protocols & APIs– Protocol-mediated access to remote resources

– Integrate and extend existing standards

– “On the Grid” = speak “Intergrid” protocols Develop a reference implementation

– Open source Globus Toolkit

– Client and server SDKs, services, tools, etc. Grid-enable wide variety of tools

– Globus Toolkit, FTP, SSH, Condor, SRB, MPI, … Learn through deployment and applications

Page 36: Grid Computing: Concepts, Applications, and Technologies

[email protected] IIT DELHI

41

Key Protocols

The Globus Toolkit™ centers around four key protocols– Connectivity layer:

> Security: Grid Security Infrastructure (GSI)

– Resource layer:> Resource Management: Grid Resource Allocation Management

(GRAM)

> Information Services: Grid Resource Information Protocol (GRIP) and Index Information Protocol (GIIP)

> Data Transfer: Grid File Transfer Protocol (GridFTP)

Also key collective layer protocols– Info Services, Replica Management, etc.