UK e-Science Program
description
Transcript of UK e-Science Program
UK e-Science Program
Core Centres 2001 (EPSRC) Research Council Pilot projects
Godiva Ocean grid (NERC)
Genie Earth System (NERC)
e-Minerals (NERC)
e-Biodiversity (BBSRC)
Open EPSRC call for new e-Science Centres Reading e-Science Centre (Nov. 2003) Resources: Access Grid Node
Technical Director: Jon Blower
The Reading e-Science Centre(ReSC)
Jon Blower
Technical Director
http://www.resc.rdg.ac.uk
Aims of the ReSC
Promote e-Science methods in the environmental science community– CGAM, DARC, ESSC, JCMM, NCAS all at Reading
Act as a focus for all e-Science activities in Reading
Provide expertise, help and support for these activities
Reach out into government agencies and industry– esp. Met Office, Environment Agency– British Maritime Technology
What is e-Science?
“science increasingly done through distributed global collaborations enabled by the Internet, using very large data collections, terascale computing resources and high performance visualization”
What is e-Science? (2)
Easier definition: “Collaborative science using distributed computing”
Who can benefit?– Users of lots of computing power
– Users of large datasets
– Users of very distributed datasets
– scientists who work across geographical and institutional boundaries
Easier to explain with some concrete examples
Case Studies
Case 1: Ensemble modelling
The Problem:– Climate is sensitive to very many factors. How do we work out
which factors are most important in determining our future climate?
The Solution:– Run (fairly simple) simulations many, many times over with
different parameters (an ensemble run)
– climateprediction.net: participants all over the world run the model on their home PCs
climateprediction.net results
Already largest climate model ensemble ever (by factor of >200) >45,000 users, >15,000 complete model runs, >1,000,000 model years
in ~3 months (this is equivalent to 1.5 Earth Simulators)
Large range of sensitivities found:
• Global outreach (participants in all 7 continents, inc. Antarctica!)• Generated much interest in schools (coolkidsforacoolclimate.com)
10K2K
Case 2: Sharing large datasets
The Problem:– There are many different models of ocean circulation and we
would like to compare and visualize the results. But there are lots of different data formats, and there’s lots of data!
The Solution:– Create an Internet-based service that allows users to cut out just
the data they want, and get it in the format they want (this is called Grid Access Data Service, GADS)
– Developed under the GODIVA project
GODIVA Web Portal
• Allows users to interactively select data for download using a GUI
• Users can create movies on the fly
• cf. Live Access Server
Case 3: Highly distributed data
The Problem:– In order to study the genetic origins of a disease it is necessary to
interrogate many data sources to perform in silico experiments to test hypotheses
The Solution:– Provide Web Services to access these data sources and a means for
combining these Services into workflows.
– These workflows can be shared between scientists, experiments can be easily repeated
– myGrid project is doing just this (www.mygrid.org.uk)
The Taverna workbench
Each blob on the diagram is a Web Service
Flexible way of creating a distributed application
taverna.sourceforge.net
e-Science concepts
e-Science buzzwords
The GRID– highly heterogeneous network of supercomputers, clusters and commodity
machines (and one PS2!)– cf. power grids (long way off!)– not all e-Science is done on The GRID (in fact, most isn’t at the moment)
Interoperability / standards– absolutely necessary for working together and avoiding duplication of
effort
Metadata and Semantics (“The Semantic Web”)– Metadata = “data about data”, vital for discovering data resources– Meaning of data (semantics) must be precisely specified
The tools of the trade
Middleware– software that “glues together” existing systems and connects
people with distant resources
Condor– Manages task of running jobs over several computers
Globus (Toolkit)– Most popular middleware, handles authentication, job submission,
etc
– version 3 very different from previous versions; it’s based on…
Web Services
Web Services
“Black box” subroutine that can be accessed over the Internet
Platform and language neutral– for example, code can run on Solaris, but be called from Mac,
Windows, Linux etc, any language
Huge industry backing– IBM, Microsoft, Sun, etc
Grid Services extend WS for long-lived jobs– notification of progress, persistence of data etc
Workflows
Web Services can be composed into “workflows” to create a distributed application– hot topic of research and debate in e-Science
Lots of standards and tools to do this, but no one clear “winner” yet
BPEL is popular, but really designed for business-to-business (B2B) interaction
Example workflow
Comparedatasets
Visualize results
Perform diagnostics
Extract dataset 1
Extract dataset 2
Convert format
Visualization
Key component of many e-Science projects Vital for validating models and finding features of interest
– not just “pretty pictures”
Can do collaborative visualization– several groups can look at the same thing at the same time
– e.g. mammography in hospitals
Real-time visualization of model results permits computational steering– RealityGrid (www.realitygrid.org)
– explore parameter space much more quickly
GODIVA visualization
Adaptive meshing gives data compression with little visible degradation
60 x 60 x 66 data points ~ ¼ million reduced by factor of ~10
Back to the ReSC...
Why ReSC?
Centre of Excellence in Environmental e-Science Reading Uni has strong links with Met Office, and
Environment Agency Support existing Reading e-Science activities
– in ESSC, Comp Sci, Plant Sciences, etc
– acts as focus and central point of contact
– not just environmental e-Science
Complements NIEeS– National Institute for Environmental e-Science in Cambridge
– www.niees.ac.uk
Who are we?
Two co-Directors– Keith Haines (ESSC)
– Rachel Harrison (Computer Science)
Technical Director (first point of contact)– Jon Blower (ESSC)
Many Associates– Mike Evans, Lizzie Froude, Kevin Hodges, Chunlei Liu, Kecheng
Liu, Adit Santokhee
– join us!
What are we doing?
Building Reading e-Science community– Comp Sci, Met Dept, CGAM, DARC, Plant Sciences
Building infrastructure– Building Condor pool between ESSC and Comp Sci, further in
future
– Bidding for dedicated compute cluster
Building software– Web Services for environmental data access and manipulation
Outreach into govt agencies and industry– BMT, ECMWF, MCA, SEEDA
– using Reading Enterprise Hub
ReSC projects
Flexible Online Environmental Data Systems (EDAS)– SEEDA project– delivery of live Met Office data to end users– e.g. BMT for search and rescue / oil spill mitigation
GODIVA– Grid for Ocean Diagnostics, Interactive Visualization and Analysis
GADS– Grid Access Data Service
Lizzie Froude’s PhD studentship– storm tracking diagnostics on large, distributed data sets
Lots more going on in Reading– e.g. BiodiversityWorld– Computer Science
How you can get involved
Talk to us!
Join the Reading University e-Science mailing list– [email protected]
Read our website: www.resc.rdg.ac.uk
Use the Wiki site to share ideas– Register expertise and interests
– Share documents that might be of general use
What we can do for you
Provide technical expertise– e.g. on Web Services, workflow, etc
Provide advice on getting funding
Help find collaborators, resources etc
Provide computational resources
Provide live data
Provide Access Grid for use
The Access Grid
What is the Access Grid?
(not to be confused with The GRID!) State-of-the-art videoconferencing suite Can hold meetings with many sites at once
– everyone can see and hear everyone else
Reduces travel costs and saves lots of time Uses high-speed internet
– no running costs!
Easy to operate– don’t need dedicated technician
In conclusion…
ReSC is here to support all Reading e-Science activity We specialise in environmental e-Science We’re always looking for new projects to be involved in Many potential future projects
– especially in area of delivery of real-time Met Office or Environment Agency data
– engage GIS community
Let us know what you would like us to do!– [email protected]
Other environmental e-Science projects
GENIE
Grid-Enabled Integrated Earth System model Aims to create a distributed, component-based model of the earth
system Will study long-term climate change and palaeoclimate Will incorporate components representing atmosphere, ocean, land
surface, ice, ocean and land biogeochemistry, ocean sediments Developing novel computing techniques for model framework,
integration, data management, visualization
www.genie.ac.uk
GENIE (contd.)
Response of Atlantic circulation to freshwater forcing
New ways of working:– Web Portal for composing + executing simulations, retrieving results
– Use of flocked Condor pools (London, Soton) and Beowulf clusters
– Data client for post-processing
GENIE (contd.)
3 international collaborators (Japan, US, Switzerland) Involvement in international projects: PRISM, EMIC, GAIM 4 Oral, 2 poster presentations at EUG/AGU (Nice), IUGG (Japan),
AHM 03 4 refereed journal papers (1 in press, 3 submitted) Engagement with industry (50K each from Intel, Compusys for
meetings) ~20 people at present using shared code repository
– Tyndall Centre will use code in integrated assessment model
GODIVA
Grid for Ocean Diagnostics, Interactive Visualisation and Analysis Aims to quantify the thermohaline circulation via analysis of model
results and observational data Developing Web Services for performing common tasks on
oceanographic data:– Data extraction, processing, analysis, visualisation
These Services will be composed into “workflows” to create flexible, distributed applications– collaborating with other e-Science projects (e.g. myGrid) in this matter
GODIVA progress
Talks/demonstrations at All Hands meeting and SCGlobal 2003 Created prototype client application:
– extracts live data and performs 3-D rendering
Also created data portal providing global access to data (next slide) Will engage GIS community (e.g. MarineGIS project in Ireland) MENTION irregular mesh
www.nerc-essc.ac.uk/godiva
GODIVA Data Portal
Web-based, similar to Live Access Server
Users select area of interest and can download data or create movies in matter of seconds or minutes
Uses distributed computing for visualisation
NERC Data Grid
Objective is build a grid which makes data discovery, delivery and use much easier than it is now
Standards compliant (ISO 19115, 19118), semantic data model for maximum interoperability
Data can be stored in many different ways (flat files, databases…) Clear separation between discovery and use of data. 1 PI, 2 co-Investigators, 4 FTE staff, 3 registered US collaborators
ndg.nerc.ac.uk
NERC Data Grid progress
Involved in many UK events (All Hands, Met Soc, NIEeS workshops etc)
Generated much international interest (US, France, Netherlands, Australia…)
Major challenges:– Influencing OGC and ISO to support the complex requirements of
the climate simulation community
– Developing a “feature-registry” to allow semantics of data types to be well understood by different communities
climateprediction.net
Have created extremely powerful and distributed climate modelling facility by running model simulation on home computers (cf. SETI@home)
Launch ensemble of coupled simulations of 1950-2000 and compare with observations.
Run on to 2050 under a range of natural and anthropogenic forcing scenarios.
Investigates sensitivity of climate system to increasing CO2 with range of parameter values
Have collaborated with other universities and industry to build system
e-Minerals Models the atomistic processes involved in environmental
issues (radioactive waste disposal, pollution, weathering)– Simulation of radiation damage (Daresbury)– Order-N quantum mechanical model of fluids (Cambridge)– Complex fluid-mineral interfaces – crystal growth and
dissolution (Bath) Developing new methods
– embedded clusters: links simulations of various sophistication to cover greater ranges of scales
– first use of quantum Monte Carlo techniques in mineral sciences
eminerals.org
e-Minerals (contd.)
Have constructed minigrid across institutions to run code– ~30 scientists in 8 institutions
Users submit jobs using a Web Portal– This integrates the CCLRC Data Portal with the HPC Portal
Developing tools for collaborative visualisation across the virtual organisation
Collaborating with Peter Murray-Rust to extend the Chemical Markup Language (CML) for computational chemistry
NIEeS
National Institute for Environmental e-Science Promotes and supports the use of e-science and grid
technologies within the UK environmental science community
Holds workshops, courses, training events, visitor programmes, demonstration projects
Industry event forthcoming (Feb 12th)– generating much interest
www.niees.ac.uk
NIEeS (contd.)
Up to end of 2003 (since launch in July 2002):– 14 events held– 901 participants
e.g. Earth Systems Modelling workshop (Oct 03) received coverage in national press and engaged Earth Simulator community in Japan
Event sponsorship from BNFL, LaserScan In-kind support from EDINA, ICE, IEMA, MIRO Additional help from Hi Consulting
Illustration of an e-Science problem
SOC’s latest OCCAM model runs at 1/12 degree resolution, covering the entire globe
Every model day, model outputs 8GB of data– Hence whole data set will be several TB in size
How do we work with this data set?– Might want to do analysis, visualisation etc– Extract just the data you want and work with it– OR move the programs (code) to the data, not vice-versa
These are two key principles of e-Science
• Subset / resample
• Transform / regrid / rotate
• Analyse
• Compare
Working with large data sets
UK e-Science Centres
National e-Science Centre (NeSC)
National Institute for Environmental e-Science (NIEeS)
GADS: Background
• Climate scientists have a need to access large datasets:– Model data and satellite observations– Data in a variety of formats (netCDF, HDF,
GRIB, more), grids, naming conventions– Model intercomparisons (MERSEA)
• Existing standards (DODS/OPeNDAP) are limited
Advantages of GADS
Data are abstracted from storage Data can be exposed with standard variable names, even if
data files do not conform to standards Data can be delivered in many formats, irrespective of
internal storage format Deployed as Web Service
– Platform – independent
– Compatible with current eScience advances