Computing and Data Grids for Science and Engineering doesciencegrid.org William E. Johnston wej/...

57
Computing and Data Grids for Science and Engineering www.ipg.nasa.gov doesciencegrid.org William E. Johnston http://www-itg.lbl.gov/~wej/ Computational Research Division, DOE Lawrence Berkeley National Laboratory and NASA Advanced Supercomputing (NAS) Division, NASA Ames Research Center 6/24/03

Transcript of Computing and Data Grids for Science and Engineering doesciencegrid.org William E. Johnston wej/...

Page 1: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

Computing and Data Gridsfor Science and Engineering

www.ipg.nasa.govdoesciencegrid.org

William E. Johnstonhttp://www-itg.lbl.gov/~wej/

Computational Research Division,DOE Lawrence Berkeley National Laboratory

and

NASA Advanced Supercomputing (NAS) Division,

NASA Ames Research Center

6/24/03

Page 2: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

2

The Process of Large-Scale Science is Changing

• Large-scale science and engineering problems require collaborative use of many compute, data, and instrument resources all of which must be integrated with application components and data sets that are– developed by independent teams of researchers– or are obtained from multiple instruments– at different geographic locations

The evolution to this circumstance is what has driven my interest in high-speed distributed computing – now Grids – for 15 years.

E.g., see [1], and below.

Page 3: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

Complex Infrastructure is Needed forSupernova Cosmology

Page 4: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

Carbon Assimilation

CO2 CH4

N2O VOCsDust

HeatMoistureMomentum

ClimateTemperature, Precipitation,Radiation, Humidity, Wind

ChemistryCO2, CH4, N2O

ozone, aerosols

MicroclimateCanopy Physiology

Species CompositionEcosystem StructureNutrient Availability

Water

DisturbanceFiresHurricanesIce StormsWindthrows

EvaporationTranspirationSnow MeltInfiltrationRunoff

Gross Primary ProductionPlant RespirationMicrobial RespirationNutrient Availability

Ecosystems

Species CompositionEcosystem Structure

WatershedsSurface Water

Subsurface WaterGeomorphology

Biogeophysics

En

erg

y

Wa

ter

Ae

ro-

dyn

am

ics

Biogeochemistry

MineralizationDecomposition

Hydrology

So

il W

ate

r

Sn

ow

Inte

r-ce

pte

dW

ate

r

Phenology

Bud Break

Leaf Senescence

HydrologicCycle

VegetationDynamics

Min

ute

s-T

o-H

ou

rsD

ays-

To

-Wee

ks

Yea

rs-T

o-C

en

turi

esThe Complexity of a “Complete” Approach to Climate Modeling –

Terrestrial Biogeoscience Involves Many Interacting Processes and Data

(Courtesy Gordon Bonan, NCAR: Ecological Climatology: Concepts and Applications. Cambridge University Press, Cambridge, 2002.)

Page 5: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

5

Cyberinfrastructure for Science

• Such complex and data intensive scenarios require sophisticated, integrated, and high performance infrastructure to provide the– resource sharing and distributed data management,– collaboration, and– application frameworks

that are needed to successfully manage and carry out the many operations needed to accomplish the science

• This infrastructure involves– high-speed networks and services– very high-speed computers and large-scale storagehighly capable middleware, including support for

distributed data management and collaboration

Page 6: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

6

The Potential Impact of Grids

• A set of high-impact science applications in the areas of– high energy physics– climate– chemical sciences– magnetic fusion energy

have been analyzed [2] to characterize their visions for the future process of science – how must science be done in the future in order to make significant progress

Page 7: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

7

The Potential Impact of Grids

• These case studies indicate that there is a great deal of commonality in the infrastructure that is required in every case to support those visions – including a common set of Grid middleware

• Further, Grids are maturing to the point where it is providing useful infrastructure for solving the computing, collaboration, and data problems of science application communities (e.g. as illustrated by the case studies, below)

Page 8: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

8

Grids: Highly Capable Middleware

• Core Grid services / Open Grid Services Infrastructure– Provide the consistent, secure, and uniform

foundation for managing dynamic and administratively heterogeneous pools of compute, data, and instrument resources

• Higher level services / Open Grid Services Architecture– Provide value-added, complex, and aggregated

services to users and application frameworks• E.g. information management –

Grid Data services that will provide a consistent and versatile view of data – real and virtual – of all descriptions

Page 9: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

NERSCSupercomputing

& Large-Scale Storage

PNNL

LBNLANL

ESnet

Europe

DOEScience Grid ORNL

ESNet

X.509CA

Grid Managed ResourcesAsia-Pacific

Funded by the U.S. Dept. of Energy, Office of Science,Office of Advanced Scientific Computing Research,

Mathematical, Information, and Computational Sciences Division

Sys

tem

s m

an

ag

em

en

t a

nd

acc

ess

Co

mm

un

ica

tion

S

erv

ice

s

Au

the

ntic

atio

n

Au

tho

riza

tion

Se

curit

y S

erv

ice

s

Grid

In

form

atio

n

Se

rvic

e

Un

iform

Co

mp

utin

gA

cce

ss

Un

ix a

nd

OG

SI

ho

stin

g

Glo

ba

l Eve

nt

Se

rvic

es,

A

ud

itin

g,

Mo

nito

ring

Co

-Sch

ed

ulin

g

Un

iform

Da

ta

Acc

ess

Supernova Observatory

scientific instruments

Synchrotron Light Source

User Interfaces

Higher-level Services / OGSA (Data Grid Services, Workflow management, Visualization, Data Publication/Subscription, Brokering, Job Mg’mt, Fault Mg’mt, Grid System Admin., etc.)

Core Grid Services / OGSI: Uniform access to distributed resources

Applications (Simulations, Data Analysis, etc.)

Application Frameworks (e.g. XCAT, SciRun) and Portal Toolkits (e.g. XPortlets)

Page 10: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

10

Grids: Highly Capable MiddlewareAlso ….• Knowledge management

– Services for unifying, classifying and “reasoning about” services, data, and information in the context of a human centric problem solving environment – the Semantic Grid

– Critical for building problem solving environments that• let users ask “what if” questions• ease the construction of multidisciplinary systems

by providing capabilities so that the user does not have to be an expert in all of the disciplines to build a multidisciplinary system

Page 11: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

11

• Grids are also– A worldwide collection of researchers and

developers– Several hundred people from the US, European,

and SE Asian countries working on best practice and standards at the Global Grid Forum (www.gridforum.org)

– A major industry effort to combine Grid Services and Web Services (IBM, HP, Microsoft) (E.g. see [3])

– Vendor support from dozens of IT companies

Grid Middleware

Page 12: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

12

Web Services and Grids

• Web services provide for– Describing services (programs) with sufficient

information that they can be discovered and combined to make new applications (reusable components)

– Assembling groups of discovered services into useful problem solving systems

– Easy integration with scientific databases that use XML based metadata

Page 13: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

13

Web Services and Grids

• So …– Web Services provide for defining, accessing, and

managing services

while– Grids provide for accessing and managing

dynamically constructed, distributed compute and data systems, and provide support for collaborations / Virtual Organizations

Page 14: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

14

Combining Web Services and Grids

• Combining Grid and Web services will provide a dynamic and powerful computing and data system that is rich in descriptions, services, data, and computing capabilities

• This infrastructure will give us the basic tools to deal with complex, multi-disciplinary, data rich science models by providing– for defining the interfaces and data in a

standard way – the infrastructure to interconnect those

interfaces in a distributed computing environment

Page 15: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

15

Combining Web Services and Grids

• This ability to utilize distributed services is important in science because highly specialized code and data is maintained by specialized research groups in their own environments, and it is neither practical nor desirable to bring all of these together on a single system

• The Terrestrial Biogeoscience climate system is an example where all of the components will probably never run on the same system – there will be manysub-models and associated data that are built and maintained in specialized environments

Page 16: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

Carbon Assimilation

CO2 CH4

N2O VOCsDust

HeatMoistureMomentum

ClimateTemperature, Precipitation,Radiation, Humidity, Wind

ChemistryCO2, CH4, N2O

ozone, aerosols

MicroclimateCanopy Physiology

Species CompositionEcosystem StructureNutrient Availability

Water

DisturbanceFiresHurricanesIce StormsWindthrows

EvaporationTranspirationSnow MeltInfiltrationRunoff

Gross Primary ProductionPlant RespirationMicrobial RespirationNutrient Availability

Ecosystems

Species CompositionEcosystem Structure

WatershedsSurface Water

Subsurface WaterGeomorphology

Biogeophysics

En

erg

y

Wa

ter

Ae

ro-

dyn

am

ics

Biogeochemistry

MineralizationDecomposition

Hydrology

So

il W

ate

r

Sn

ow

Inte

r-ce

pte

dW

ate

r

Phenology

Bud Break

Leaf Senescence

HydrologicCycle

VegetationDynamics

Min

ute

s-T

o-H

ou

rsD

ays-

To

-Wee

ks

Yea

rs-T

o-C

en

turi

esTerrestrial Biogeoscience – A “Complete” Approach to Climate Modeling –

Involves Many Complex, Interacting Processes and Data

(Courtesy Gordon Bonan, NCAR: Ecological Climatology: Concepts and Applications. Cambridge University Press, Cambridge, 2002.)

Page 17: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

17

Combining Web Services and Grids

• The complexity of the modeling done in Terrestrial Biogeoscience is a touchstone for this stage of evolution of Grids and Web Services – this is one of the problems to solve in order to provide a significant increase in capabilities for science

• Integrating Grids and Web Services is a major thrust at GGF – e.g. in the OGSI and Open Grid Services Architecture Working Groups.Also see http://www.globus.org/ogsa/

Page 18: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

18

The State of Grids

• Persistent infrastructure is being built - this is happening, e.g., in– DOE Science Grid– NASA’s IPG– International Earth Observing Satellite Committee

(CEOS)– EU Data Grid– UK eScience Grid– NSF TeraGrid– NEESGrid (National Earthquake Engineering

Simulation Grid)

all of which are focused on large-scale science and engineering

Page 19: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

19

The State of Grids – Some Case Studies

• Further, Grids are becoming a critical element of many projects – e.g.– The High Energy Physics problem of

managing and analyzing petabytes of data per year has driven the development of Grid Data Services

– The National Earthquake Engineering Simulation Grid has developed a highly application oriented approach to using Grids

– The Astronomy data federation problem has promoted work in Web Services based interfaces

Page 20: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

20

High Energy Physics Data Management

• Petabytes of data per year must be distributed to hundreds of sites around the world for analysis

• This involves– Reliable, wide-area, high-volume data

management– Global naming, replication, and caching of

datasets– Easily accessible pools of computing resources

• Grids have been adopted as the infrastructure for this HEP data problem

Page 21: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

Tier 1

Tier2 Center

Online System

eventreconstruction

French Regional Center

German Regional Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations

~100 MBytes/sec

~0.6-2.5 Gbps

100 - 1000

Mbits/sec

Physics data cache

~PByte/sec

Tier2 CenterTier2 CenterTier2 Center

~0.6-2.5 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

CERN/CMS data goes to 6-8 Tier 1 regional centers, and from each of these to 6-10 Tier 2 centers.

Physicists work on analysis “channels” at 135 institutes. Each institute has ~10 physicists working on one or more channels.

2000 physicists in 31 countries are involved in this 20-year experiment in which DOE is a major player.

CERN LHC CMS detector

15m X 15m X 22m, 12,500 tons, $700M.

human=2m

analysis

Italian Center FermiLab, USA Regional Center

Courtesy Harvey

Newman, CalTech

High Energy Physics Data Management CERN / LHC Data: One of Science’s most challenging data

management problems

~2.5 Gbits/sec

event simulation

Page 22: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

22

High Energy Physics Data Management

• Virtual data catalogues and on-demand data generation have turned out to be an essential aspect– Some types of analysis are pre-defined and

catalogued prior to generation - and then the data products are generated on demand when the virtual data catalogue is accessed

– Sometimes regenerating derived data is faster and easier than trying to store and/or retrieve that data from remote repositories

– For similar reasons this is also of great interest to the EOS (Earth Observing Satellite) community

Page 23: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

US-CMS/LHC Grid Data Services Testbed:International Virtual Data Grid Laboratory

Virtual Data Tools

Data Generation Request

Planning &Scheduling Tools

Data GenerationRequest

Execution & Management Tools

Transforms

Distributed resources(code, storage, CPUs,networks)

Interactive User Tools

Raw datasource

•Metadata catalogues•Virtual data catalogues

ResourceManagement

Security andPolicy

Other GridServices Core Grid Services

metadatadescriptionof analyzed

data

Page 24: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

CMS Event Simulation Productionusing GriPhyN Data Grid Services

• Production Run on the Integration Testbed (400 CPUs at 5 sites)– Simulate 1.5 million full CMS events for physics studies– 2 months continuous running across 5 testbed sites– Managed by a single person at the US-CMS Tier 1site Nearly 30 CPU years delivered 1.5 Million Events to CMS

Physicists

Page 25: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

25

Partnerships with the Japanese Science Community

• Comments of Paul Avery [email protected], director iVDGL– iVDGL is specifically interested in partnering with the Japanese

HEP community and hopefully the National Research Grid Initiative will opens doors for collaboration

– Science drivers are critical – existing international HEP collaborations in Japan provide natural drivers

– Different Japanese groups could participate in existing or developing Grid applications oriented testbeds, such as the ones developed in iVDGL for the different HEP experiments

• These testbeds have been very important for debugging Grid software while serving as training grounds for existing participants and new groups, both at universities and national labs.

– Participation in and development of ultra-speed networking projects provides collaborative opportunities in a crucial related area. There are a number of new initiatives that are relevant

• Contact Harvey B Newman <[email protected]> for a fuller description and resource materials.

Page 26: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

26

National Earthquake Engineering Simulation Grid

• NEESgrid will link earthquake researchers across the U.S. with leading-edge computing resources and research equipment, allowing collaborative teams (including remote participants) to plan, perform, and publish their experiments

• Through the NEESgrid, researchers will– perform tele-observation and tele-operation of

experiments – shake tables, reaction walls, etc.; – publish to, and make use of, a curated data repository

using standardized markup; – access computational resources and open-source

analytical tools; – access collaborative tools for experiment planning,

execution, analysis, and publication

Page 27: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

27

NEES Sites

• Shake Table Research Equipment– University at Buffalo, State

University of New York

– University of Nevada, Reno

– *University of California, San Diego

• Centrifuge Research Equipment– *University of California, Davis

– Rensselaer Polytechnic Institute

• Tsunami Wave Basin– *Oregon State University,

Corvallis, Oregon

• Large-Scale Lifeline Testing– Cornell University

• Large-Scale Laboratory Experimentation Systems– University at Buffalo, State

University of New York – *University of California at

Berkeley– *University of Colorado,

Boulder – University of Minnesota-Twin

Cities – Lehigh University – University of Illinois, Urbana-

Champaign

• Field Experimentation and Monitoring Installations– *University of California, Los

Angeles– *University of Texas at Austin – Brigham Young University

Page 28: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

Field Equipment

Laboratory Equipment

Remote Users

Remote Users: (K-12 Faculty and Students)

High-Performance Network(s)

Instrumented Structures and Sites

Large-scale Computation

Curated Data Repository

Laboratory Equipment

Global Connections

Simulation Tools Repository

NEESgrid Earthquake Engineering Collaboratory

Page 29: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

29

NEESgrid Approach

• Package a set of application level services and the supporting Grid software in a single“point of presence” (POP)

• Deploy the POP to a select set of earthquake engineering sites to provide the applications, data archiving, and Grid services

Assist in developing common metadata so that the various instruments and simulations can work together

• Provide the required computing and data storage infrastructure

Page 30: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

30

NEESgrid Multi-Site Online Simulation (MOST)

• A partnership between the NEESgrid team, UIUC and Colorado Equipment Sites to showcase NEESgrid capabilities

• A large-scale experiment conducted in multiple geographical locations which combines physical experiments with numerical simulation in an interchangeable manner

• The first integration of NEESgrid services with application software developed by Earthquake Engineers (UIUC, Colorado and USC) to support a real EE experiment

• See http://www.neesgrid.org/most/

Page 31: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

31

NEESgrid Multi-Site Online Simulation (MOST)

U. Colorado U. Colorado Experimental Experimental

SetupSetup

UIUC Experimental UIUC Experimental SetupSetup

Page 32: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

32

Multi-Site, On-Line Simulation Test (MOST)

ColoradoColorado

Experimental Model

gx

f2m1, 1

F2

F1

e

gx=

gx

f1, x1

UIUCUIUC

Experimental Model

gx

m1

f1 f2

NCSANCSA

Computational Model

SIMULATIONSIMULATION

COORDINATORCOORDINATOR

NEESpop NEESpop

NEESpop

UIUC MOST-SIM•Dan Abrams•Amr Elnashai•Dan Kuchma•Bill Spencer• and othersColorado FHT•Benson Shing•and others

Page 33: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

1994 Northridge Earthquake SimulationRequires a Complex Mix of Data and Models

Pier #7

Pier #8

Pier #5

Pier #6

Amr Elnashai, UIUC

NEESgrid provides the common data formats, uniform dataarchive interfaces, and computational services needed to supportthis multidisciplinary simulation

Page 34: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

Laboratory EquipmentInstrumented

Structures and Sites

Large-scale Computation

Curated Data Repository Simulation

Tools Repository

NEESgrid Architecture

Data AcquisitionSystem

Data AcquisitionSystem

Large-scale Storage

Video ServicesVideo Services

E-NotebookServices

E-NotebookServices

GridFTPGridFTP

MetadataServicesMetadataServices

CompreHensive collaborativEFramework (CHEF)

CompreHensive collaborativEFramework (CHEF)

NEESGrid StreamingData System

NEESGrid StreamingData System

NEESpop

Java AppletJava AppletWebBrowser

WebBrowser

User Interfaces

NEES distributed resources

Grid Services

SIMULATIONSIMULATION

COORDINATORCOORDINATOR

Accounts &MyProxy

Accounts &MyProxy

NEESgridMonitoringNEESgridMonitoring

NEES Operations

ExperimentsExperiments MultidisciplinarySimulations

MultidisciplinarySimulations CollaborationsCollaborations

Page 35: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

35

Partnerships with the Japanese Science Community

• Comments of Daniel Abrams <[email protected]>, Professor of Civil Engineering, University of Illinois and NEESGrid project manager– The Japanese earthquake research community has expressed

interest in NEESgrid– I am aware of some developmental efforts between one

professor and another to explore feasibility of on-line pseudodynamic testing - Professor M. Watanabe at the University of Kyoto is running a test in his lab which is linked with another test running at KAIST (in Korea) with Professor Choi. They are relying on the internet for transmission of signals between their labs.

– International collaboration with the new shaking table at Miki is being encouraged and thus they are interested in plugging in to an international network.  There is interest in NEESgrid in installing a NEESpop there so that the utility could be evaluated, and connections made with the NEESGrid sites.

– We already have some connection to the Japanese earthquake center known as the Earthquake Disaster Mitigation Center.  We have an MOU with EDM and the Mid-America Earthquake Center in place.  I am working with their director, Hiro Kameda,  and looking into establishing a NEESGrid relationship.

Page 36: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

36

The Changing Face of Observational Astronomy

• Large digital sky surveys are becoming the dominant source of data in astronomy: > 100 TB, growing rapidly– Current examples: SDSS, 2MASS, DPOSS, GSC,

FIRST, NVSS, RASS, IRAS; CMBR experiments; Microlensing experiments; NEAT, LONEOS, and other searches for Solar system objects …

– Digital libraries: ADS, astro-ph, NED, CDS, NSSDC– Observatory archives: HST, CXO, space and ground-

based– Future: QUEST2, LSST, and other synoptic surveys;

GALEX, SIRTF, astrometric missions, GW detectors

• Data sets orders of magnitude larger, more complex, and more homogeneous than in the past

Page 37: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

37

The Changing Face of Observational Astronomy

• Virtual Observatory: Federation of N archives– Possibilities for new discoveries grow as O(N2)

• Current sky surveys have proven this– Very early discoveries from Sloan (SDSS),

2 micron (2MASS), Digital Palomar (DPOSS)

• see http://www.us-vo.org

Page 38: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

38

Sky Survey Federation

Page 39: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

Mining Data from Dozens of Instruments / Surveys is Frequently a Critical Aspect of Doing Science

• The ability to federate survey data is enormously important

• Studying the Cosmic Microwave Background – a key tool in studying the cosmology of the universe – requires combined observations from many instruments in order to isolate the extremely weak signals of the CMB

• The datasets that represent the material “between” us and the CMB are collected from different instruments and are stored and curated at many different institutions

• This is immensely difficult without approaches like National Virtual Observatory in order to provide a uniform interface for all of the different data formats and locations

(Julian Borrill, NERSC, LBNL)

Page 40: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

40

NVO Approach

• Focus is on adapting emerging information technologies to meet the astronomy research challenges– Metadata, standards, protocols (XML, http)– Interoperability– Database federation– Web Services (SOAP, WSDL, UDDI)– Grid-based computing (OGSA)

• Federating data bases is difficult, but very valuable– An XML-based mark-up for astronomical tables and catalogs -

VOTable– Developed metadata management framework– Formed international “registry”, “dm” (data models), “semantics”,

and “dal” (data access layer) discussion groups

• As with NEESgrid, Grids are helping to unify the community

Page 41: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

41

NVO Image Mosaicking

• Specify box by position and size• SIAP server returns relevant images

• Footprint• Logical Name• URL

Can choose:

standard URL:http://.......

SRB URLsrb://nvo.npaci.edu/…..

Page 42: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

42

Atlasmaker Virtual Data System

Metadata repositoriesFederated by OAI

Data repositoriesFederated by SRB

Compute resourcesFederated by TG/IPG

Mosaicked data is on

file

2a. Mosaicked data is not on file

2d: Store result &

return result

2c: Compute on TG/IPG

Userrequest

Request manager

2b. Get raw data from NVO resources

Core Grid Services

Higher LevelGrid Services

Page 43: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

43

Background Correction

Uncorrected Corrected

Page 44: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

44

NVO Components

Simple Image Access ServicesCone Search Services

UCDs

Visualization

Web Services

Grid Services

Cross-Correlation Engine

Resource/Service Registries

Streaming

VOTable VOTable

UCDs

Data archives Computing resources

Page 45: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

45

International Virtual Observatory Collaborations

• Astrophysical Virtual Observatory (European Commission)

• AstroGrid, UK e-scienceprogram

• Canada

• VO India

• VO Japan(leading the work on VO query language)

• VO China

• German AVO

• Russian VO

• e-Astronomy Australia

• IVOA(International Virtual Observatory Alliance)

US contacts: Alex Szalay [email protected], Roy Williams [email protected],Bob Hanisch <[email protected]>

Page 46: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

Where to in the Future?The potential of a Semantic Grid / Knowledge Grid:

Combining Semantic Web Services and Grid Services

• Even when we have well integrated Web+Grid services we still do not provide enough structured information and tools to let us ask “what if” questions, and then have the underlying system assemble the required components in a consistent way to answer such a question.

Page 47: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

47

Beyond Web Services and Grids

• A commercial example “what if” question:– What does my itinerary look like if I wish to go

SFO to Paris, CDG, and then to Bucharest.– In Bucharest I want a 3 or 4 star hotel that is

within 3 km of the Palace of the Parliament, and the hotel cost may not exceed the U. S. Dept. of State, Foreign Per Diem Rates.

Page 48: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

48

Beyond Web Services and Grids

• To answer such a question – a relatively easy task, but tedious, for a human – the system must “understand” the relationships between maps and locations, between per diem charts and published hotel rates, and it must be able to apply constraints (< 3 km, 3 or 4 star,cost < $ per diem rates, etc.)

• This is the realm of “Semantic Grids”

Page 49: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

49

Semantic Grids / Knowledge Grids

• Work is being adapted from the Artificial Intelligence community to provide [4]

– “Ontology languages” to extend metadata to represent relationships

– Language constructs to express rule based / constraint relationships among, and generalizations of, the extended terms

Page 50: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

50

Future Cyberinfrastructuretechnology impact

Resource Description Framework (RDF) [7]

Expresses relationships among “resources” (URI(L)s) in the form of object-attribute-value (property). Values of can be other resources, thus we can describe arbitrary relationships between multiple resources.

RFD uses XML for its syntax.

Can ask questions like “What are a particular property’s permitted values, which types of resources can it describe, and what is its relationship to other properties.”

Resource Description Framework Schema (RDFS) [7]

An extensible, object-oriented type system that effectively represents and defines classes.

Object-oriented structure: Class definitions can be derived from multiple superclasses, and property definitions can specify domain and range constraints.

Can now represent tree structured information (e.g. Taxonomies)

Page 51: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

51

Future Cyberinfrastructuretechnology impact

Ontology Inference Layer (OIL) [8]

OIL inherits all of RDFS, and adds expressing class relationships using combinations of intersection (AND), union (OR), and compliment (NOT). Supports concrete data types (integers, strings, etc.)

OIL can state conditions for a class that are both sufficient and necessary. This makes it possible to perform automatic classification: Given a specific object, OIL can automatically decide to which classes the object belongs.

This is functionality that should make it possible to ask the sort of constraint and relationship based questions illustrated above.

OWL (DAML+OIL) +…… [9] Knowledge representation and manipulation that have well defined semantics and representation of constraints and rules for reasoning

Page 52: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

52

Semantic Grid Capabilities

• Based on these technologies, the emerging Semantic Grid [6] / Knowledge Grid [5] services will provide several important capabilities

1) The ability to answer “what if” questions by providing constraint languages that operate on ontologies that describe content and relationships of scientific data and operations, thus “automatically” structuring data and simulation / analysis components into Grid workflows whose composite actions produce the desired information

Page 53: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

53

Semantic Grid Capabilities

2) Tools, content description, and structural relationships so that when trying to assemble multi-disciplinary simulations, an expert in one area can correctly organize the other components of the simulation without having to involve experts in all of the ancillary sub-models (components)

Page 54: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

54

Future Cyberinfrastructure

• Much work remains to make this vision a reality

• The Grid Forum has recently established a Semantic Grid Research Group [10] to investigate and report on the path forward for combining Grids and Semantic Web technology.

Page 55: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

55

Thanks to Colleagues who Contributed Material to this Talk

• Dan Reed, Principal Investigator, NSF NEESgrid; Director, NCSA and the Alliance; Chief Architect, NSF ETF TeraGrid; Professor, University of Illinois - [email protected]

• Ian Foster, Argonne National Laboratory and University of Chicago, http://www.mcs.anl.gov/~foster

• Dr. Robert Hanisch, Space Telescope Science Institute, Baltimore, Maryland

• Roy Williams, Cal Tech; Dan Abrams, UIUC; Paul Avery, Univ. of Florida; Alex Szalay, Johns Hopkins U.; Tom Prudhomme, NCSA

Page 56: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

Grid Services: secure and uniform access and management for distributed resources

Science Portals: collaboration and problem solvingWeb Services

Supercomputing andLarge-Scale Storage

High Speed Networks

Spallation Neutron Source

High Energy Physics

Advanced Photon Source

Macromolecular Crystallography

Advanced Engine Design

Computing and Storage

of Scientific Groups

Supernova Observatory

Advanced Chemistry

Page 57: Computing and Data Grids for Science and Engineering  doesciencegrid.org William E. Johnston wej/ Computational.

57

Notes[1] “The Computing and Data Grid Approach: Infrastructure for Distributed Science

Applications,” William E. Johnston. http://www.itg.lbl.gov/~johnston/Grids/homepage.html#CI2002

[2] “DOE Office of Science, High Performance Network Planning Workshop.”August 13-15, 2002: Reston, Virginia, USA.http://doecollaboratory.pnl.gov/meetings/hpnpw

[3] “Developing Grid Computing Applications” in IBM developerWorks : Web services : Web services articleshttp://www-106.ibm.com/developerworks/library/ws-grid2/?n-ws-1252

[4] See “The Semantic Web and its Languages,” an edited collection of articles in IEEE Intelligent Systems, Nov. Dec. 2000. D. Fensel, editor.

[5] For an introduction to the ideas of Knowledge Grids I am indebted to Mario Cannataro, Domenico Talia, and Paolo Trunfio (CNR, Italy). See www.isi.cs.cnr.it/kgrid/

[6] For an introduction to the ideas of Semantic Grids I am indebted to Dave DeRoure (U. Southampton), Carol Gobel (U. Manchester), and Geoff Fox (U. Indiana). See www.semanticgrid.org

[7] “The Resource Description Framework,” O. Lassila. ibid.

[8] “FAQs on OIL: Ontology Inference Layer,” van Harmelen and Horrocks. ibid. and “OIL: An Ontology Infrastructure for the Semantic Web.” Ibid.

[9] “Semantic Web Services,” McIlraith, Son, Zeng. Ibid. and “Agents and the Semantic Web,” Hendler. Ibid.

[10] See http://www.semanticgrid.org/GGF This GGF Research Group is co-chaired by David De Roure <[email protected]>, Carole Goble <[email protected]>, and Geoffrey Fox <[email protected]>