UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of...

Post on 28-Mar-2015

221 views 2 download

Tags:

Transcript of UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of...

UK e-Science

Grid Infrastructure meets BiologicalResearch Challenges

Malcolm Atkinson

Director of National e-Science Centrewww.nesc.ac.uk

2nd October 2002

The UK Biological Grid — Data and ComputationThe Wellcome Trust Genome Campus

Hinxton, Cambridgeshire

Overview

UK e-ScienceReminder of Investment and Infrastructure

International e-ScienceExamples and Collaboration

Data Access and IntegrationLego Bricks for Scientific Application Developers

A Computer Scientist’s View of Biology

Diversity and Opportunity

The Way Ahead

e-Science

Fundamentally about CollaborationSharing

Ideas Thought processes and Stimuli Effort Resources

Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure

Scientists (Biologists) have done this for Centuries

e-Science (take 2)

Fundamentally about CollaborationSharing

Ideas Thought processes and Stimuli Effort Resources

Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure

Text, digital media, structured, organised & curated data, computable

models, visualisation, shared instruments, shared systems,

shared administration, …

Nationally & Internationally Distributed, …

Routine, Daily, Automated, …

That Requires very Significant Investment in DigitalSystems and their Support

e-Science (take 3)

Fundamentally about CollaborationSharing

Ideas Thought processes and Stimuli Effort Resources

Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure

Digital networks, digital work-places, digital

instruments, …

Metadata, ontologies, standards, shared curated

data, shared codes, …

Common platforms, shared software, shared training, …

The Grid SHOULD make this much easier byproviding a common, supported high-level of Software and Organisational infrastructure

Authentication, Authorisation, Accounting,

Provenance, Policies, …

Shared Provision of Platform,

Grid ExpectationsPersistence

Always there, Always Working, Always Supported

StabilityYou can build on foundations that don’t move

Trustworthy & PredictableHonours commitments

Digital policies, digital contracts, security, … Data integrity, longevity and accessibility Performance

High-level & ExtensibleThe capabilities you need are already there

UbiquitousYour collaborators use it

Grid RealityPersistence

Always there, Always Working, Always Supported

StabilityYou can build on foundations that don’t move

Trustworthy & PredictableHonours commitments

Digital policies, digital contracts, security, … Data integrity, longevity and accessibility Performance

High-level & ExtensibleThe capabilities you need are already there

UbiquitousYour collaborators use it

Political, Economic & Technical issues to Solve

Early days but Open Grid Services link with Web

Services + GGF standardisation

Not yet but very substantialglobal effort to achieve this

Good basis for extensionCommitment to basic functionality

WS + Community effort

Global & Industrial Rallying CryMust work with Web Services

Cambridge

Newcastle

Edinburgh

Oxford

Glasgow

Manchester

Cardiff

Southampton

London

Belfast

Daresbury Lab

RALHinxton

UK Grid Network

Nationale-

ScienceCentre

Access Grid always-on video always-on video wallswalls

HPC(x)

National e-Science Centre

EventsWorkshopsResearch MeetingsInternational Meetings

History of EventsGGF5HPDC11Summer school > 50 workshops held> 1000 people in totalMany return often

Planned Events25 workshops Conferences to 2005

Visitors3 arrived4 arranged

International collaboration, visits & visitors

ChinaArgonne National LabSDSCNCSA…

Centre ProjectsPilot ProjectsRegional SupportResearch Projects

EPSRC, MRC, WT, SHEFC

A day in the life of NeSC

DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago

tomographic reconstruction

real-timecollection

wide-areadissemination

desktop & VR clients with shared controls

Advanced Photon Source

Online Access to Scientific Instruments

archival storage

From Steve Tuecke 12 Oct. 01

UCSF

UIUC

From Klaus Schulten, Center for Biomollecular Modeling and Bioinformatics, Urbana-Champaign

DataGrid Testbed

Dubna

Moscow

RAL

Lund

Lisboa

Santander

Madrid

Valencia

Barcelona

Paris

Berlin

LyonGrenoble

Marseille

BrnoPrague

Torino

Milano

BO-CNAFPD-LNL

Pisa

Roma

Catania

ESRIN

CERN

HEP sites

ESA sites

IPSL

Estec KNMI

(>40)

Francois.Etienne@in2p3.fr - Antonia.Ghiselli@cnaf.infn.it

Testbed Sites

A Simplified Grid Anatomy

Grid Plumbing & Security Infrastructure

Scheduling Accounting Authorisation

Monitoring Diagnosis Logging

Scientific Application

Data & Compute ResourcesOperationsTeam

ApplicationDevelopers

Distributed

Owners

Scientific Users

A Biological Grid Anatomy

Grid Plumbing & Security Infrastructure

Scheduling Accounting Authorisation

Monitoring Diagnosis Logging

Scientific Application

Data & Compute Resources

Distributed

Biological Users

Data Access

Data Integration

Structured Data

Database Growth

PDB protein structures

Scientific Data

Deluge of DataExponential growth

Doubling timesAstronomy 12 monthsBio-Sequences 9 monthsFunctional Genomics 6 monthsBytes/dollar 12 to 18 months

Not How big it is but

Scientific Data

Deluge of DataExponential growth

Doubling timesAstronomy 12 monthsBio-Sequences 9 monthsFunctional Genomics 6 monthsBytes/dollar 12 to 18 months

Not How big it is butWhat you do with it

SharingCurationMetadataAutomated movement, access & integrationComputational Access

Scientific Data

Deluge of DataExponential growth

Doubling timesAstronomy 12 monthsBio-Sequences 9 monthsFunctional Genomics 6 monthsBytes/dollar 12 to 18 months

Not How big it is butHow you Embrace & Manage Change

The Database is a Knowledge chestThe Database is a Communication HubAutonomously Managed (Curated) changeAn Essential part of e-BioMedical Science

Wellcome Trust: Cardiovascular Functional Genomics

Glasgow Edinburgh

Leicester

Oxford

LondonNetherlands

Shared dataPublic curated

data

Data Access & Integration

Central to e-ScienceEspecially Earth Sciences, Ecology, Biology & Medicine

Collaboration Shared Databases Curated Knowledge Accumulated Observations Accumulated Simulations

Computation Data mining Input to models Calibration of models

Presentation Publication of results Visualisation

GGF DAIS WGChairs

Norman Paton (Manchester Uni.)Leanne Guy (CERN)Dave Pearson (Oracle UK)

ActivityBoF GGF4 TorontoWG Meeting GGF5 EdinburghPapers for GGF6Workshops & Mail lists

GoalsAgree Standards for Database Access & IntegrationFreely available reference implementations

OGSA-DAI one source & focus for discussions

Norman Paton,Inderpal Narang,

Leanne Guy, Susan Maliaka, Greg Ricardi, …

OGSA-DAI project

Lego kit for Data Access & IntegrationComponents for e-Science Applications

Accelerated Application DevelopmentMultiple Data Models

Distributed DataAccess via Grid & Proxies

Integration, Translation & Transformation

Open Source Reference Implementation

For DAIS-WG standard

Trigger for Component ConstructionStart a community

Oxford

Glasgow

Cardiff

Southampton

London

Belfast

Daresbury Lab

RAL

OGSA-DAI Partners

EPCC & NeSC

Newcastle

IBMUSA

IBM Hursley

Oracle

Manchester

EPCC & NeSCIBM UKIBM USAManchester e-SCNewcastle e-SCOracle £3 million, 18 months, started February 2002

Cambridge

Hinxton

Primary Components

Client

Consumer

GDS

GDSF

GDSR

DB

Advanced Components

Consumer

GDS Client

GDT

Translation

Translation

DB

GDS:PerformScript

Composed Components

Translation

Consumer

GDS

Translation

GDT

GDS:performScript

GDT

GDT

Client

GDS:performScript

GDS:performScript

GDS:performScript

Distributed Query

Registry R

Client

Consumer GDT

GDS

GDTV

DQP

GDT

GDTV

GDS

QPM

NS

F Factory

Evaluator

GDTV GDT

Evaluator

GDTV GDT

Evaluator

GDTV GDT

GDS

GDS

GDTV DB

T

Q

T

PNM

T

PNM

GDS

T

GDTV

D Q P : D is t ribu te d Q u e ry Pro ce s s o rG D T : G rid D a ta Tra n s po rtT : Tra n s la t io nQ : Q u e ryG D TV : G rid D a ta Tra n s po rt V e h icleF : Fa cto ryQ PM : Q u e ry Pro g re s M o n ito rPNM : Pro g re s s No t if ica t io n M e s s a g eA M : A pplica t io n M e ta da taC R M : C o m pu ta t io n a l R e s o u rce M e ta da taNS : No t if ica t io n S in k

1

2

5

3

4

5

5

7

7

6

6

7

7

7

7

(7) 8

6

OGSA-DAI Time Line

Feb ’02 May ’02 Jul ’02 Sep ’02 Dec ’02 Feb ’03 May ’03 Sep ’03

Ship Alpha Release for GT3 Integration

RDB + GT2 / OGSA Prototypes Available

XML + OGSA Prototype Available

Design Documents & Demos for DAIS WG @ GGF5

XML + OGSA Prototypes for Early Adopters

WS + GSI UK support ( > 100 downloads)

Phase 2 StartsPhase 1 Starts

Presentation & Beta @ GGF7

GGF6 WG Papers & Prototypes

Productisation, RAMPS &Extension

OGSA-DAI Summary

On Schedule & Going WellContributions via DAIS-WG @ GGF5 & 6Releases with GT3 Releases scheduledStatus: Early Days

Released prototypesTested Architectural DesignUsing OGSAWorking with Early Adopter Pilot Projects

AstroGrid & MyGrid

Influence OGSA-DAI directionVia DAIS-WG & Direct messages to us

Biomedical e-Scientists

Is this one species?Understanding bird energyUnderstanding a river / ocean interactionUnderstanding a biochemical pathwayUnderstanding a cellUnderstanding a Heart or BrainUnderstanding RhododendraUnderstanding Evolution…

No One-Size fits all solutionsBut sharable re-usable components

Opportunities

Many, many …More than we can addressCompute needsData management needsData integration needs…

Must choose some pioneersTo meet a range of common requirementsTo provoke rich & high-level platformTo generate re-usable components

A Long-Term Commitment Needed

Advancing Biological Grid

Grid Plumbing & Security Infrastructure

Scheduling Accounting Authorisation

Monitoring Diagnosis Logging

Scientific Application

Data & Compute Resources

Distributed

Biological Users

Data Access

Data Integration

Structured Data

Biomedical (Grid) Application Component Library

Summary

e-ScienceData as well as Compute Challenges

Needed to be put together

Need ubiquitous supported consistent platforms

GridA (potentially) invaluable platformOnly show in town

Data IntegrationHard Develop & Use Standard kit of partsStarted to build the kit

OpportunitiesNo one-size fits all, but re-usable subsystemsInvest in wider range of Problem driven pioneeringStrategic choices needed