All Hands Meeting 2004
BIRN Coordinating Center Status Report
Mark Ellisman
Philip Papadopoulos
What is BIRN?
150,000
LONI
harvard
ncrr.nih
Biomedical Informatics & Research Biomedical Informatics & Research
Biocomplexity
Discovery and Systems research approaches complement Hypothesis-based research
Integrative, multidisciplinary team approach adapted for complex queries versus focused approach for hypothesis-driven research
Team approach more dependent on advanced technologies and instrumentation which generate large data sets
Information management at core of biomedical research for 21st century and beyond
Biocomplexity
Discovery and Systems research approaches complement Hypothesis-based research
Integrative, multidisciplinary team approach adapted for complex queries versus focused approach for hypothesis-driven research
Team approach more dependent on advanced technologies and instrumentation which generate large data sets
Information management at core of biomedical research for 21st century and beyond
Overview of the BIRN-CC Roadmap
Deliver and maintain a robust and scalable PRODUCTION Grid for the collaborative sharing, analysis and interrogation of biomedical data
Provide system integration to bring user applications into BIRN
Provide a consistent and scalable software delivery mechanism
Facilitate the use of advancing information technologies by biomedical scientists - “Cyberinfrastructure” and the “Grid”
Be the biomedical applications driver framing requirements for the rapidly evolving GRID infrastructure
“Enforce the AEIOU’s – Accessibility, Extensibility, Interoperability, Openness, Usability, Scalability”
Hardware
Integrated Cyberinfrastructure System meeting the needs of multiple communities Source: Dr. Deborah Crawford, Chair, NSF CyberInfrastructure Working Group
Grid Services & Middleware
DevelopmentTools & Libraries
Applications• Environmental Science• High Energy Physics• Biomedical Informatics• Geoscience
Domain-specific
Cybertools (software)
Domain-specific
Cybertools (software)
Shared Cybertools (software)
Shared Cybertools (software)
Distributed Resources
(computation, communicationstorage, etc.)
Distributed Resources
(computation, communicationstorage, etc.)
Ed
uca
tion a
nd
Tra
inin
g
Dis
covery
& In
novati
on
BIRN Core Software Infrastructure
GASS GRAMGSI SRBGridFTPNWSMDS/GRIS
File/DataJob ManagementAuthentication Information
Grid Middleware Services
BIRN Portal
Data SourcesApplicationsComputation
Remote Servers / Sites
Custom APIs mediator client
GridPort Services
registry planner
PortletsGrid Services
Web Server / Applications Server
Collaboration Data ManagementViewing/VisualizationPipelines Queries/Results
BIRN Toolkit
mediatorgatew ay
executor
Statistics/Analysis Spatial Ontology
PACS
Distributed Resources
Distributed Resources
• BIRN builds on evolving community standards for middleware
• Adds new capabilities required by projects
•Does System Integration of domain-specific tools building a distributed infrastructure
• Utilizes commodity hardware and stable networks for baseline connectivity
BIRN Core Software Infrastructure
GASS GRAMGSI SRBGridFTPNWSMDS/GRIS
File/DataJob ManagementAuthentication Information
Grid Middleware Services
BIRN Portal
Data SourcesApplicationsComputation
Remote Servers / Sites
Custom APIs mediator client
GridPort Services
registry planner
PortletsGrid Services
Web Server / Applications Server
Collaboration Data ManagementViewing/VisualizationPipelines Queries/Results
BIRN Toolkit
mediatorgatew ay
executor
Statistics/Analysis Spatial Ontology
PACS
Distributed Resources
Distributed Resources
• BIRN builds on evolving community standards for middleware
• Adds new capabilities required by projects
•Does System Integration of domain-specific tools building a distributed infrastructure
• Utilizes commodity hardware and stable networks for baseline connectivity
Grid Services & Middleware
DevelopmentTools & Libraries
Shared Tools for Multiple ScienceDomains
Shared Tools for Multiple ScienceDomains
Distributed Computing, Instruments and Data Resources
Your Specific Tools
& User Apps.
Your Specific Tools
& User Apps.
Friendly Work Facilitating PortalsAuthentication - Authorization - Auditing - Workflows - Visualization - Analysis
BIRN Core Software Infrastructure
Distributed Resources
• BIRN builds on evolving community standards for middleware
• Adds new capabilities required by projects
•Does System Integration of domain-specific tools building a distributed infrastructure
• Utilizes commodity hardware and stable networks for baseline connectivity
Grid Services & Middleware
DevelopmentTools & Libraries
Shared Tools
ScienceDomains
Shared Tools
ScienceDomains
Distributed Computing, Instruments and Data Resources
Your Specific Tools & User Apps.
Your Specific Tools & User Apps.
Friendly Work Facilitating PortalsAuthentication - Authorization - Auditing - Workflows - Visualization - Analysis
Bio
med
ical
In
form
atic
s “B
IRN
”
Hig
h E
neg
y P
ysic
s G
riP
hyN
Geo
scie
nce
s “G
EO
N”
Bay
s an
d R
iver
s (M
oo
re F
ou
nd
.)
Ear
thq
uak
e “N
EE
S”
Oce
an O
bse
rvin
g “
Lo
oki
ng
”
BIRN is Pioneering:BIRN is Pioneering: We are Making Unique We are Making Unique and Fundamental Contributions to Establish and Fundamental Contributions to Establish
Working GRIDsWorking GRIDs BIRN is setting an example for other Grid project BIRN is setting an example for other Grid project
deployments deployments [i.e. use of Rocks and automated [i.e. use of Rocks and automated distribution mechanisms]distribution mechanisms]• GEON, etc…GEON, etc…
BIRN is a driver application for other major GRID BIRN is a driver application for other major GRID initiativesinitiatives• Common security APIs being used within Common security APIs being used within
BIRN, Telescience, GEONBIRN, Telescience, GEON• OptIPuter - research into next generation OptIPuter - research into next generation
networking - BIRN is the Bioscience Drivernetworking - BIRN is the Bioscience Driver• Drives requirements to the Global Grid Forum Drives requirements to the Global Grid Forum
and Internet 2 development effortsand Internet 2 development efforts
Grid Infrastructure in ActionGrid Infrastructure in Action The Grid is already The Grid is already
having an impact…having an impact…• Many projects in many Many projects in many
subjects:subjects: Life sciencesLife sciences MedicineMedicine EnvironmentEnvironment EngineeringEngineering MaterialsMaterials ChemistryChemistry PhysicsPhysics
• BIRN embodies the most BIRN embodies the most innovative use of data, innovative use of data, metadata & portalsmetadata & portals BIRN cited as successful
model of grid computing.
The Grid is becoming the backbone for The Grid is becoming the backbone for collaborative science and data sharingcollaborative science and data sharing
BIRN Infrastructure Provides…BIRN Infrastructure Provides… high performance connectivity between distributed
resources (computation and data storage)• JHU utilizing TeraGrid resources pulling data from SRBJHU utilizing TeraGrid resources pulling data from SRB
secure access to large volumes of distributed data distributed high performance computing resources
• BIRN just received an NSF Large Resource Allocation BIRN just received an NSF Large Resource Allocation Committee award (450,000 service units, i.e. processor Committee award (450,000 service units, i.e. processor hours)hours)
frameworks (standards, APIs, services) for the integration and interoperation of tools, users, data and computing resources• Improved high level “wrapper” toolsImproved high level “wrapper” tools• common authentication protocolcommon authentication protocol
• Intuitive user interfaces to access grid based computational analyses
•Transparent access to distributed data found within the BIRN Data Grid
Access to Grid Resources
Case Study: JHU - LDDMM grid computing launched from BIRN Portal
Semi Automatic Shape Analysis study utilizing compute intensive analyses (i.e. Large Deformation Diffeomorphic Metric Mapping)
BIRN has the Advantage of having BIRN has the Advantage of having Developed an “End-to-End” InfrastructureDeveloped an “End-to-End” Infrastructure in the context of distributed biomedical
research projects. Consists of all the components required to Consists of all the components required to
effectively share and collaboratively explore dataeffectively share and collaboratively explore data• The BIRN Rack (BIRN site infrastructure)The BIRN Rack (BIRN site infrastructure)• The BIRN Virtual Data GridThe BIRN Virtual Data Grid• The BIRN Mediation InfrastructureThe BIRN Mediation Infrastructure• The BIRN PortalThe BIRN Portal
The system integration, development, deployment and The system integration, development, deployment and management of this infrastructure is the main focus of management of this infrastructure is the main focus of activities within the BIRN Coordinating Centeractivities within the BIRN Coordinating Center
Continually improve the BIRN software Continually improve the BIRN software infrastructure infrastructure (i.e. performance, robustness, end-(i.e. performance, robustness, end-to-end integration, and interoperability)to-end integration, and interoperability)
Standardize the software delivery process Standardize the software delivery process by by providing twice yearly scheduled software releases – providing twice yearly scheduled software releases – April & OctoberApril & October• Develop internal processes for alpha, beta and Develop internal processes for alpha, beta and
production releasesproduction releases• Instantiate robust development, staging and Instantiate robust development, staging and
production environments production environments • Improved documentation and tutorials for all Improved documentation and tutorials for all
componentscomponents
Provide automated deployment mechanismsProvide automated deployment mechanisms
Improving the BIRN EnvironmentImproving the BIRN Environment
BIRN Portal Updated BIRN Portal with new
and improved features currently in production
Worked with test beds to improve the usability and performance of the BIRN Portal• Improved Performance• Updated Portal API for more
robust operation• Implemented guest pages and
accounts• Enhanced security and integration
with the BIRN Authentication infrastructure
• Updated look and feel for improved usability
• Providing online documentation & tutorials
… is that you rely on the integrity of the gatekeeper
The problem with portals …
Benefits of a Data GridBenefits of a Data Grid
Uniform interface for connecting to heterogeneous distributed data resources • Allows for any “grid enabled” tool to interact with
data no matter where it is located or what it is located on
Allows for the seamless creation and management of distributed data sets• Distributed data appear as a single managed
collection both to users and tools
Access is Managed using GRID Authentication through BIRN Portal
Security: Access and AuditSecurity: Access and Audit• Intuitive interfaces to core infrastructure (e.g. the BIRN Virtual Data Grid) and services (e.g. full auditing on BIRN data or image viewing)
Google is not a portal………Google is not a portal………
Carrot juice cures pilesCarrot juice cures piles
A result?
From Ken Peach,
Rutherford Labs UK
If you dig deep enough you may get what you want (but perhaps not exactly what you need)
Carrot juice cures pilesCarrot juice cures pilesCarrot juice cures pilesCarrot juice cures piles
1,680
Drink a juice of turnip leaves, Drink a juice of turnip leaves, spinach, water cress and spinach, water cress and carrots (equal quantity)carrots (equal quantity)
Example of Data Mediation within BIRN
Find all joint projects between UCSD and Duke w/ relevance to Lewy Body Disease
Benefits of Data Mediation Provide means to locate, access and interrogate Provide means to locate, access and interrogate
data contained in distributed databasesdata contained in distributed databases
Can add new resources without modifying Can add new resources without modifying existing data resourcesexisting data resources
Promote flexible views on top of the dataPromote flexible views on top of the data
Semantically and spatially integrate multi-scale Semantically and spatially integrate multi-scale and multi-modal dataand multi-modal data
BIRN Data Mediation
Version 2.0 of the BIRN mediator is currently in alpha testing (i.e. as a core component of the BIRN 2.0 release)
Improvements to the new release• Enhanced query performance • Updated registration, query and view building tools• Support for PostgreSQL databases• Integrated with BIRN authentication infrastructure
BIRN-CC is exploring additional data mediation approaches with collaborators• Yale - Query Integrator System (QIS)• GEON - IBM Information Integrator
From Vision to Reality
“It’s all in the software”• “It’s not a bug, it’s a feature”• “That will be in the next version”• “When is the next version?”
“I just want to open a file” “I need to monitor and control who accesses my
data” “How do I locate data of interest to me?” “I need a boatload of computing, how do I find it?” “Why the heck isn’t this easier?”
New sites and collaborative projects are being addedNew sites and collaborative projects are being added
BIRN Grid Testbed Sites
We Began with Standard Hardware This Jumpstarted BIRN for
functionality Software footprint is
managed from the BIRN Coordinating Center
Integration of domain tools, middleware, OS, updates, and more
BIRN expansion/upgrade of existing sites must have a more generic (and less expensive) hardware footprint
BIRN CC Software Concerns & Operations
Deploy/Manage/Update Common Services• Portal/Website• Security Infrastructure• Metadata Catalog – SRB MCAT• Mediator Registry• Source code repository• Java Application Servers
Deploy/Manage/Update Site Racks• Enterprise Linux• Databases and Data Grid Clients• Mediated Data Resources• BIRN applications (e.g. LONI, 3D-Slicer, FreeSurfer, …)
“It’s all in the Software”
Critical Issues• What is the BIRN Software Stack?• When is it updated?• What Services are supported?
Integrated releases of all BIRN software• Defining components: Input/SW from all of BIRN
Candidate software – 3 months prior to release Alpha phase (functionality freeze) – 2 monts prior to release
• Defined Schedule April/October releases - 1 month beta cycle
Pre-alpha is defined now – Part of this meeting should be to prioritize components for April ’05.
More on Software Releases
Defined release cycles is intended to • Provide software stability for users and developers• Allow everyone to plan on when system changes will occur
As a whole, BIRN will need to prioritize what goes into a release• There are limited people and testing resources
Transferring software is not a trivial task• Packaging uncovers system assumptions
We use Rocks to define “appliances”• 100% automated configuration of endpoints and services• BIRN tools need to be transferable to other NIH projects
I Just Want to Open a File …
BIRN has been built upon data collections• Data was copied in/out of data grid • Meta data allows transparent location/querying• Requires scripts/changes to code
Distributed File System Layer • Experimenting with AFS • Feasibility/performance of developing SRBFS not clear
BIRN Application Workflow
Mediated Data
Data CollectionDB
Distributed File System
Local/NFS File System
Mediator – under development (v 2.0)
Oracle/Postgres - SRB
Standard OS
I Need to Monitor Access to My Files
Authentication (Identification)• GSI Certificates• Managed transparently by the Portal – Username/Passwd• Have developed a Java Class to encapsulate GSI functionality to ease
the development of GSI-aware SW
Access control already built in to Data Collection Management (authorization)• As we introduce other data modalities, we need to develop a
vocabulary that is useful Translate to specific software systems
• Eg. SRB, Oracle/Postgres Table Security, AFS, GridFTP, …
• There is a dearth of community tools to build upon here BIRN can help drive the community
I need to locate data of interest to me
Two ways now:• Meta data attached to collection-managed data
“Retrieve all DAT-KO MRI images”
• Data Mediator Gives the illusion of a single database New relationships among separate database.
What about distributed file systems?• You get pathnames, only• You can VI (or emacs) a file – that is read/write/open/close
works as expected• It is reasonable to look at a DFS as a step stone• Very useful as community working directories where
metadata is less important, but access control is critical
I need a boatload of computing …
JHU has been experimenting with using Teragrid and loading data into the BIRN Data Grid• Their storage resources are at 90+%
Condor is deployed on Racks, but• We need to look at Use cases and utility.• Automated data management (Move my data to the
computing) is still clumsy at best
Pathfinder applications help to more crisply define the software stack
Why the heck isn’t this easier?
It really hasn’t been done before• A significant number of dimensions
Application usage Security requirements Scale of data and of distributed systems
• Software is evolving to be more robust• Cyberinfrastructure architecture has converged to
services-based Implementation Grid Services -> Web Services (within the year)
We’ve needed a Software rallying point• Regular release schedule should help provide the pacing
that we need
BIRN Core Software Infrastructure
GASS GRAMGSI SRBGridFTPNWSMDS/GRIS
File/DataJob ManagementAuthentication Information
Grid Middleware Services
BIRN Portal
Data SourcesApplicationsComputation
Remote Servers / Sites
Custom APIs mediator client
GridPort Services
registry planner
PortletsGrid Services
Web Server / Applications Server
Collaboration Data ManagementViewing/VisualizationPipelines Queries/Results
BIRN Toolkit
mediatorgatew ay
executor
Statistics/Analysis Spatial Ontology
PACS
Distributed Resources
Distributed Resources
• BIRN builds on evolving community standards for middleware
• Adds new capabilities required by projects
•Does System Integration of domain-specific tools building a distributed infrastructure
• Utilizes commodity hardware and stable networks for baseline connectivity
All Hands Meeting 2004
~2000 years old and still readable without technology
The Forest of Stones, Xi’anThe Forest of Stones, Xi’an
Evolution of the Computational InfrastructureSource: Dr. Deborah CrawfordChair, NSF CyberInfrastructure Working Group (CIWG)
Supercomputer Centers
PACI
Terascale
1985 1990 1995 2000 2005 2010
| | | | | |
NPACI and Alliance
SDSC, NCSA, PSC, CTC
TCS, DTF, ETF
Cyberinfrastructure
Prior Computing Investments
NSF Networking
Mosaic - Web Browser
GRID Term Coined ~ Metacomputing
A timeline from the Computational Infrastructure Division of the US National Science Foundation
Telescience: Access to Remote Resources
Top Related