Integrative Biology BOF - Usable Systems in the Global Environment All Hands 2006

15
July 2006 Integrative Biology 1 ntegrative Biology F - Usable Systems in the Global Environmen Hands 2006 rsday 21 st September

description

Integrative Biology BOF - Usable Systems in the Global Environment All Hands 2006 Thursday 21 st September. Agenda. What is Integrative Biology ? – a quick recap! Who are the IB users? Challenges in developing solutions for a diverse community The IB technology to date. - PowerPoint PPT Presentation

Transcript of Integrative Biology BOF - Usable Systems in the Global Environment All Hands 2006

July 2006

Integrative Biology 1

Integrative Biology

BOF - Usable Systems in the Global Environment

All Hands 2006

Thursday 21st September

July 2006

Integrative Biology 2

•What is Integrative Biology ? – a quick recap!

• Who are the IB users?

• Challenges in developing solutions for a diverse community

• The IB technology to date

Agenda

July 2006

Integrative Biology 3

Integrative Biology - Project Rationale

To leverage the global Grid infrastructure to build an international

“collaboratory” which places the applications scientist “within” the

Grid allowing fully integrated and collaborative use of:

•HPC resources (capacity and capability)•Computational steering, performance control and visualisation•Storage and data-mining of very large data sets•Easy incorporation of experimental data

•User- and science-friendly access

=> Predictive in-silico models to guide experiment and,

ultimately, design of novel drugs and treatment regimes

July 2006

Integrative Biology 4

What are our objectives?

NGSRAL

NGS Ox Compute

NGS Man NGS

Leeds Compute

Atlas DataStore

HPCx – parallel optimal codes

UCL Altix Test Machine

Global Users

Integrative Biology

Hide the complexity from the users through the use of an IB portal or client

Your own

Cluster..

EU Grids..

Teragrid

July 2006

Integrative Biology 5

The Integrative Biology Scientific Users

Degree and/or post grad qualification in

Industrial engineering, maths, biology, physiology

Typically…

Computing skills developed over time to allow

them to develop models. Not computer

scientists. Not grid savvy.

Keen to use and adapt other

Scientists work

Based in Oxford, Nottingham, Birmingham,

Auckland,Tulane, Washington Lee, Calgary,

Baltimore, Sheffield, Utrecht, Graz…

July 2006

Integrative Biology 6

Determining requirements • Evolving users, disparate needs, identify current pains• Evolving knowledge driving new requirements• Don’t know what they want until they see and refine it• Grid not something they want to know about, consideration of

language• Initial interviews assessed as is, constraints and security

requirements for competitive research • Concept of collaboration varied• Do they need a grid? Exploratory journey for users

July 2006

Integrative Biology 7

Key problems identified

• Data management problematic, too much generated and tying information together an art

• Current simulations tie up desktops for many hours

• Visualising results on desktop limited by local facilities and ad hoc development of suitable tools

• Research is sensitive, concept of an experiment either for

an individual or a collaborative group

• Laptop to HPC migration for most users a huge leap not a small step

• Collaboration and Communication requires tools e.g. Oxford/Tulane

• Cannot exclude scientific community who have not progressed to computational models (digital pens)

July 2006

Integrative Biology 8

The ‘collaboratory’ - What have we developed ?

• Facilities for submission of compute jobs to NGS and HPCx via portal or command line or Matlab. Extension to own clusters in development

• Comprehensive data management and metadata management facilities including federation of catalogues and with Auckland and UK

• Advanced visualisation techniques including movie generation utilising Meshalyser and Coolgraphics to date. Major revamp of these facilities due in the next 12-18 months for remote geometry generation and steering

• Phase space exploration for multi variable visualisation in Leeds

• A new VRE project developing usable interfaces to a digital research domain for IB through proof of concepts. Also exploring the digital world through a trial of digital pens for life scientist.

July 2006

Integrative Biology 9

Job submission and management via the IB Portal

Users are able to select the compute resource to be used, manage their data in their

own SRB space and to setup and manage their experiments through a metadata editor.

Users can link files and simulation information to created studies thus simplifying the

process of managing their scientific information.

This portal allows users

to submit their jobs to

these compute facilities,

monitor their progress

and to automatically pull

input files from and store

results in the project secure

repository ‘Storage Resource

Broker’.

July 2006

Integrative Biology 10

The data storage facility allows users to store any associated user files including input files, codes and output

results. Provenance data is automatically captured from a simulation run and stored alongside the results for

later use.These facilities are designed to offer large scale secure facilities for the individual researcher as well

as those interested in working more collaboratively with colleagues through the ability to share information.

Data Management and the Metadata editor

July 2006

Integrative Biology 11

Cool Graphics/Meshalyzer

IB Tools

Link to SRB and NGS

Visualization

(developed by Dr. J. Eason and Dr. E. Vigmond)

Can only be done on local machine – problem for low bandwidth users

… hence revised architecture

Planned over next 12-18 months

Issues

July 2006

Integrative Biology 12

Usable Solutions or lead weight?

• Early releases have required tame users to deal with less elegant means of submitting and managing jobs

• Constrained by infrastructure and agility of change

• VRE project aims to pull together multifaceted aspects

• Generic tools versus bespoke prototypes for selected groups e.g. Washington Lee parameter sweep

• Benefits for scientists have outweighed pains (certificates, varied rules re job queues, libraries and licensing) but

• Far from ideal solution…. Constraints still exist (bandwidth, monitoring, security)

July 2006

Integrative Biology 13

Scientific users are customers of technology

…. But technology team are users of provided

infrastructure…

– NGS

– HPCx

– (CSAR)

– SRB

– 3rd party tools …..

July 2006

Integrative Biology 14

Benefits and challenges for users

• Benefits

– Access to powerful compute resources,

– Access to vast file store facility,

– Prompt, efficient support structures.

– New science evolving and publication rate for scientists faster!

• Challenges

– Need to apply for and manage certificates

– Code development for optimal use of facilities still a challenge

– Legacy code hurdle

July 2006

Integrative Biology 15

Summary

• Integrative Biology has had to act as a bridge as well as a

provider of interfaces and services

• Starting small and iterating with users patient enough to

stick with it has enabled both teams to progress

• Security comes at a price

• Usable or tolerable?

…. But we have managed to increase publications for user

community!