All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

40
All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos

Transcript of All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Page 1: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

All Hands Meeting 2004

BIRN Coordinating Center Status Report

Mark Ellisman

Philip Papadopoulos

Page 2: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

What is BIRN?

150,000

LONI

harvard

ncrr.nih

Page 3: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Biomedical Informatics & Research Biomedical Informatics & Research

Biocomplexity

Discovery and Systems research approaches complement Hypothesis-based research

Integrative, multidisciplinary team approach adapted for complex queries versus focused approach for hypothesis-driven research

Team approach more dependent on advanced technologies and instrumentation which generate large data sets

Information management at core of biomedical research for 21st century and beyond

Biocomplexity

Discovery and Systems research approaches complement Hypothesis-based research

Integrative, multidisciplinary team approach adapted for complex queries versus focused approach for hypothesis-driven research

Team approach more dependent on advanced technologies and instrumentation which generate large data sets

Information management at core of biomedical research for 21st century and beyond

Page 4: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Overview of the BIRN-CC Roadmap

Deliver and maintain a robust and scalable PRODUCTION Grid for the collaborative sharing, analysis and interrogation of biomedical data

Provide system integration to bring user applications into BIRN

Provide a consistent and scalable software delivery mechanism

Facilitate the use of advancing information technologies by biomedical scientists - “Cyberinfrastructure” and the “Grid”

Be the biomedical applications driver framing requirements for the rapidly evolving GRID infrastructure

“Enforce the AEIOU’s – Accessibility, Extensibility, Interoperability, Openness, Usability, Scalability”

Page 5: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Hardware

Integrated Cyberinfrastructure System meeting the needs of multiple communities Source: Dr. Deborah Crawford, Chair, NSF CyberInfrastructure Working Group

Grid Services & Middleware

DevelopmentTools & Libraries

Applications• Environmental Science• High Energy Physics• Biomedical Informatics• Geoscience

Domain-specific

Cybertools (software)

Domain-specific

Cybertools (software)

Shared Cybertools (software)

Shared Cybertools (software)

Distributed Resources

(computation, communicationstorage, etc.)

Distributed Resources

(computation, communicationstorage, etc.)

Ed

uca

tion a

nd

Tra

inin

g

Dis

covery

& In

novati

on

Page 6: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

BIRN Core Software Infrastructure

GASS GRAMGSI SRBGridFTPNWSMDS/GRIS

File/DataJob ManagementAuthentication Information

Grid Middleware Services

BIRN Portal

Data SourcesApplicationsComputation

Remote Servers / Sites

Custom APIs mediator client

GridPort Services

registry planner

PortletsGrid Services

Web Server / Applications Server

Collaboration Data ManagementViewing/VisualizationPipelines Queries/Results

BIRN Toolkit

mediatorgatew ay

executor

Statistics/Analysis Spatial Ontology

PACS

Distributed Resources

Distributed Resources

• BIRN builds on evolving community standards for middleware

• Adds new capabilities required by projects

•Does System Integration of domain-specific tools building a distributed infrastructure

• Utilizes commodity hardware and stable networks for baseline connectivity

Page 7: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

BIRN Core Software Infrastructure

GASS GRAMGSI SRBGridFTPNWSMDS/GRIS

File/DataJob ManagementAuthentication Information

Grid Middleware Services

BIRN Portal

Data SourcesApplicationsComputation

Remote Servers / Sites

Custom APIs mediator client

GridPort Services

registry planner

PortletsGrid Services

Web Server / Applications Server

Collaboration Data ManagementViewing/VisualizationPipelines Queries/Results

BIRN Toolkit

mediatorgatew ay

executor

Statistics/Analysis Spatial Ontology

PACS

Distributed Resources

Distributed Resources

• BIRN builds on evolving community standards for middleware

• Adds new capabilities required by projects

•Does System Integration of domain-specific tools building a distributed infrastructure

• Utilizes commodity hardware and stable networks for baseline connectivity

Grid Services & Middleware

DevelopmentTools & Libraries

Shared Tools for Multiple ScienceDomains

Shared Tools for Multiple ScienceDomains

Distributed Computing, Instruments and Data Resources

Your Specific Tools

& User Apps.

Your Specific Tools

& User Apps.

Friendly Work Facilitating PortalsAuthentication - Authorization - Auditing - Workflows - Visualization - Analysis

Page 8: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

BIRN Core Software Infrastructure

Distributed Resources

• BIRN builds on evolving community standards for middleware

• Adds new capabilities required by projects

•Does System Integration of domain-specific tools building a distributed infrastructure

• Utilizes commodity hardware and stable networks for baseline connectivity

Grid Services & Middleware

DevelopmentTools & Libraries

Shared Tools

ScienceDomains

Shared Tools

ScienceDomains

Distributed Computing, Instruments and Data Resources

Your Specific Tools & User Apps.

Your Specific Tools & User Apps.

Friendly Work Facilitating PortalsAuthentication - Authorization - Auditing - Workflows - Visualization - Analysis

Bio

med

ical

In

form

atic

s “B

IRN

Hig

h E

neg

y P

ysic

s G

riP

hyN

Geo

scie

nce

s “G

EO

N”

Bay

s an

d R

iver

s (M

oo

re F

ou

nd

.)

Ear

thq

uak

e “N

EE

S”

Oce

an O

bse

rvin

g “

Lo

oki

ng

Page 9: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

BIRN is Pioneering:BIRN is Pioneering: We are Making Unique We are Making Unique and Fundamental Contributions to Establish and Fundamental Contributions to Establish

Working GRIDsWorking GRIDs BIRN is setting an example for other Grid project BIRN is setting an example for other Grid project

deployments deployments [i.e. use of Rocks and automated [i.e. use of Rocks and automated distribution mechanisms]distribution mechanisms]• GEON, etc…GEON, etc…

BIRN is a driver application for other major GRID BIRN is a driver application for other major GRID initiativesinitiatives• Common security APIs being used within Common security APIs being used within

BIRN, Telescience, GEONBIRN, Telescience, GEON• OptIPuter - research into next generation OptIPuter - research into next generation

networking - BIRN is the Bioscience Drivernetworking - BIRN is the Bioscience Driver• Drives requirements to the Global Grid Forum Drives requirements to the Global Grid Forum

and Internet 2 development effortsand Internet 2 development efforts

Page 10: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Grid Infrastructure in ActionGrid Infrastructure in Action The Grid is already The Grid is already

having an impact…having an impact…• Many projects in many Many projects in many

subjects:subjects: Life sciencesLife sciences MedicineMedicine EnvironmentEnvironment EngineeringEngineering MaterialsMaterials ChemistryChemistry PhysicsPhysics

• BIRN embodies the most BIRN embodies the most innovative use of data, innovative use of data, metadata & portalsmetadata & portals BIRN cited as successful

model of grid computing.

Page 11: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

The Grid is becoming the backbone for The Grid is becoming the backbone for collaborative science and data sharingcollaborative science and data sharing

Page 12: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

BIRN Infrastructure Provides…BIRN Infrastructure Provides… high performance connectivity between distributed

resources (computation and data storage)• JHU utilizing TeraGrid resources pulling data from SRBJHU utilizing TeraGrid resources pulling data from SRB

secure access to large volumes of distributed data distributed high performance computing resources

• BIRN just received an NSF Large Resource Allocation BIRN just received an NSF Large Resource Allocation Committee award (450,000 service units, i.e. processor Committee award (450,000 service units, i.e. processor hours)hours)

frameworks (standards, APIs, services) for the integration and interoperation of tools, users, data and computing resources• Improved high level “wrapper” toolsImproved high level “wrapper” tools• common authentication protocolcommon authentication protocol

Page 13: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

• Intuitive user interfaces to access grid based computational analyses

•Transparent access to distributed data found within the BIRN Data Grid

Access to Grid Resources

Case Study: JHU - LDDMM grid computing launched from BIRN Portal

Semi Automatic Shape Analysis study utilizing compute intensive analyses (i.e. Large Deformation Diffeomorphic Metric Mapping)

Page 14: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

BIRN has the Advantage of having BIRN has the Advantage of having Developed an “End-to-End” InfrastructureDeveloped an “End-to-End” Infrastructure in the context of distributed biomedical

research projects. Consists of all the components required to Consists of all the components required to

effectively share and collaboratively explore dataeffectively share and collaboratively explore data• The BIRN Rack (BIRN site infrastructure)The BIRN Rack (BIRN site infrastructure)• The BIRN Virtual Data GridThe BIRN Virtual Data Grid• The BIRN Mediation InfrastructureThe BIRN Mediation Infrastructure• The BIRN PortalThe BIRN Portal

The system integration, development, deployment and The system integration, development, deployment and management of this infrastructure is the main focus of management of this infrastructure is the main focus of activities within the BIRN Coordinating Centeractivities within the BIRN Coordinating Center

Page 15: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Continually improve the BIRN software Continually improve the BIRN software infrastructure infrastructure (i.e. performance, robustness, end-(i.e. performance, robustness, end-to-end integration, and interoperability)to-end integration, and interoperability)

Standardize the software delivery process Standardize the software delivery process by by providing twice yearly scheduled software releases – providing twice yearly scheduled software releases – April & OctoberApril & October• Develop internal processes for alpha, beta and Develop internal processes for alpha, beta and

production releasesproduction releases• Instantiate robust development, staging and Instantiate robust development, staging and

production environments production environments • Improved documentation and tutorials for all Improved documentation and tutorials for all

componentscomponents

Provide automated deployment mechanismsProvide automated deployment mechanisms

Improving the BIRN EnvironmentImproving the BIRN Environment

Page 16: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

BIRN Portal Updated BIRN Portal with new

and improved features currently in production

Worked with test beds to improve the usability and performance of the BIRN Portal• Improved Performance• Updated Portal API for more

robust operation• Implemented guest pages and

accounts• Enhanced security and integration

with the BIRN Authentication infrastructure

• Updated look and feel for improved usability

• Providing online documentation & tutorials

Page 17: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

… is that you rely on the integrity of the gatekeeper

The problem with portals …

Page 18: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Benefits of a Data GridBenefits of a Data Grid

Uniform interface for connecting to heterogeneous distributed data resources • Allows for any “grid enabled” tool to interact with

data no matter where it is located or what it is located on

Allows for the seamless creation and management of distributed data sets• Distributed data appear as a single managed

collection both to users and tools

Access is Managed using GRID Authentication through BIRN Portal

Page 19: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Security: Access and AuditSecurity: Access and Audit• Intuitive interfaces to core infrastructure (e.g. the BIRN Virtual Data Grid) and services (e.g. full auditing on BIRN data or image viewing)

Page 20: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Google is not a portal………Google is not a portal………

Carrot juice cures pilesCarrot juice cures piles

A result?

From Ken Peach,

Rutherford Labs UK

Page 21: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

If you dig deep enough you may get what you want (but perhaps not exactly what you need)

Carrot juice cures pilesCarrot juice cures pilesCarrot juice cures pilesCarrot juice cures piles

1,680

Drink a juice of turnip leaves, Drink a juice of turnip leaves, spinach, water cress and spinach, water cress and carrots (equal quantity)carrots (equal quantity)

Page 22: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Example of Data Mediation within BIRN

Find all joint projects between UCSD and Duke w/ relevance to Lewy Body Disease

Page 23: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Benefits of Data Mediation Provide means to locate, access and interrogate Provide means to locate, access and interrogate

data contained in distributed databasesdata contained in distributed databases

Can add new resources without modifying Can add new resources without modifying existing data resourcesexisting data resources

Promote flexible views on top of the dataPromote flexible views on top of the data

Semantically and spatially integrate multi-scale Semantically and spatially integrate multi-scale and multi-modal dataand multi-modal data

Page 24: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

BIRN Data Mediation

Version 2.0 of the BIRN mediator is currently in alpha testing (i.e. as a core component of the BIRN 2.0 release)

Improvements to the new release• Enhanced query performance • Updated registration, query and view building tools• Support for PostgreSQL databases• Integrated with BIRN authentication infrastructure

BIRN-CC is exploring additional data mediation approaches with collaborators• Yale - Query Integrator System (QIS)• GEON - IBM Information Integrator

Page 25: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.
Page 26: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

From Vision to Reality

“It’s all in the software”• “It’s not a bug, it’s a feature”• “That will be in the next version”• “When is the next version?”

“I just want to open a file” “I need to monitor and control who accesses my

data” “How do I locate data of interest to me?” “I need a boatload of computing, how do I find it?” “Why the heck isn’t this easier?”

Page 27: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

New sites and collaborative projects are being addedNew sites and collaborative projects are being added

BIRN Grid Testbed Sites

Page 28: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

We Began with Standard Hardware This Jumpstarted BIRN for

functionality Software footprint is

managed from the BIRN Coordinating Center

Integration of domain tools, middleware, OS, updates, and more

BIRN expansion/upgrade of existing sites must have a more generic (and less expensive) hardware footprint

Page 29: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

BIRN CC Software Concerns & Operations

Deploy/Manage/Update Common Services• Portal/Website• Security Infrastructure• Metadata Catalog – SRB MCAT• Mediator Registry• Source code repository• Java Application Servers

Deploy/Manage/Update Site Racks• Enterprise Linux• Databases and Data Grid Clients• Mediated Data Resources• BIRN applications (e.g. LONI, 3D-Slicer, FreeSurfer, …)

Page 30: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

“It’s all in the Software”

Critical Issues• What is the BIRN Software Stack?• When is it updated?• What Services are supported?

Integrated releases of all BIRN software• Defining components: Input/SW from all of BIRN

Candidate software – 3 months prior to release Alpha phase (functionality freeze) – 2 monts prior to release

• Defined Schedule April/October releases - 1 month beta cycle

Pre-alpha is defined now – Part of this meeting should be to prioritize components for April ’05.

Page 31: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

More on Software Releases

Defined release cycles is intended to • Provide software stability for users and developers• Allow everyone to plan on when system changes will occur

As a whole, BIRN will need to prioritize what goes into a release• There are limited people and testing resources

Transferring software is not a trivial task• Packaging uncovers system assumptions

We use Rocks to define “appliances”• 100% automated configuration of endpoints and services• BIRN tools need to be transferable to other NIH projects

Page 32: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

I Just Want to Open a File …

BIRN has been built upon data collections• Data was copied in/out of data grid • Meta data allows transparent location/querying• Requires scripts/changes to code

Distributed File System Layer • Experimenting with AFS • Feasibility/performance of developing SRBFS not clear

BIRN Application Workflow

Mediated Data

Data CollectionDB

Distributed File System

Local/NFS File System

Mediator – under development (v 2.0)

Oracle/Postgres - SRB

Standard OS

Page 33: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

I Need to Monitor Access to My Files

Authentication (Identification)• GSI Certificates• Managed transparently by the Portal – Username/Passwd• Have developed a Java Class to encapsulate GSI functionality to ease

the development of GSI-aware SW

Access control already built in to Data Collection Management (authorization)• As we introduce other data modalities, we need to develop a

vocabulary that is useful Translate to specific software systems

• Eg. SRB, Oracle/Postgres Table Security, AFS, GridFTP, …

• There is a dearth of community tools to build upon here BIRN can help drive the community

Page 34: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

I need to locate data of interest to me

Two ways now:• Meta data attached to collection-managed data

“Retrieve all DAT-KO MRI images”

• Data Mediator Gives the illusion of a single database New relationships among separate database.

What about distributed file systems?• You get pathnames, only• You can VI (or emacs) a file – that is read/write/open/close

works as expected• It is reasonable to look at a DFS as a step stone• Very useful as community working directories where

metadata is less important, but access control is critical

Page 35: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

I need a boatload of computing …

JHU has been experimenting with using Teragrid and loading data into the BIRN Data Grid• Their storage resources are at 90+%

Condor is deployed on Racks, but• We need to look at Use cases and utility.• Automated data management (Move my data to the

computing) is still clumsy at best

Pathfinder applications help to more crisply define the software stack

Page 36: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Why the heck isn’t this easier?

It really hasn’t been done before• A significant number of dimensions

Application usage Security requirements Scale of data and of distributed systems

• Software is evolving to be more robust• Cyberinfrastructure architecture has converged to

services-based Implementation Grid Services -> Web Services (within the year)

We’ve needed a Software rallying point• Regular release schedule should help provide the pacing

that we need

Page 37: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

BIRN Core Software Infrastructure

GASS GRAMGSI SRBGridFTPNWSMDS/GRIS

File/DataJob ManagementAuthentication Information

Grid Middleware Services

BIRN Portal

Data SourcesApplicationsComputation

Remote Servers / Sites

Custom APIs mediator client

GridPort Services

registry planner

PortletsGrid Services

Web Server / Applications Server

Collaboration Data ManagementViewing/VisualizationPipelines Queries/Results

BIRN Toolkit

mediatorgatew ay

executor

Statistics/Analysis Spatial Ontology

PACS

Distributed Resources

Distributed Resources

• BIRN builds on evolving community standards for middleware

• Adds new capabilities required by projects

•Does System Integration of domain-specific tools building a distributed infrastructure

• Utilizes commodity hardware and stable networks for baseline connectivity

Page 38: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

All Hands Meeting 2004

Page 39: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

~2000 years old and still readable without technology

The Forest of Stones, Xi’anThe Forest of Stones, Xi’an

Page 40: All Hands Meeting 2004 BIRN Coordinating Center Status Report Mark Ellisman Philip Papadopoulos.

Evolution of the Computational InfrastructureSource: Dr. Deborah CrawfordChair, NSF CyberInfrastructure Working Group (CIWG)

Supercomputer Centers

PACI

Terascale

1985 1990 1995 2000 2005 2010

| | | | | |

NPACI and Alliance

SDSC, NCSA, PSC, CTC

TCS, DTF, ETF

Cyberinfrastructure

Prior Computing Investments

NSF Networking

Mosaic - Web Browser

GRID Term Coined ~ Metacomputing

A timeline from the Computational Infrastructure Division of the US National Science Foundation

Telescience: Access to Remote Resources