Research in the Cloud

26
Research in the cloud Professor David Wallom

Transcript of Research in the Cloud

Research in the cloud

Professor David Wallom

Citizen science for extreme

weather attribution to

climate change

Our laboratory: the world’s largest climate modelling facility

11 years, 25 sub projects, ~100,000 volunteers (40,000 active), 127M model-years

Unlimited ensemble size: exploring uncertainties in climate predictions

Results of the BBC Climate Change Experiment:Rowlands et al, Nature Geosci., 2012

What is the role of increased greenhouse gas levels in UK autumn/winter flood events?

South Oxford on January 5th, 2003

Ph

oto

: Da

ve M

itche

ll

Dawlish, Devon, February 5th 2014 (MetOffice)Midlands, November 2012, (BBC website)

The weather@home regional modelling project(with Microsoft Research, the Risk Prediction Initiative and Environment Guardian)

• High impact weather events are typically rare and unpredictable.– Flooding– Heatwave– Drought

• They also involve small scales.

• Resolution provided by nested regional model.

• Modify boundary conditions to mimic counter-factual “world that might have been”.

UK Winter 2014 Floods

• 39726 simulations• 2014 flooding has been

described as a 1 in 100 year event in terms of rainfall volume

• Return time plot shows this has become a 1 in 80 year in terms of risk

• Risk of a very wet winter has increased by 25%

World Weather Attribution

Californian Drought Experiment• Small example of future

requirements• Investigate effect of climate

change on the current drought in California– 5k current conditions

including ‘the blob’– 5k current conditions with

averaged SST– 12k natural runs

• Time relevant results

http://www.climateprediction.net/weatherathome/western-us-drought/

Consequences for climateprediction.net• Data set creation on a monthly basis• Data consumers outside the project with delivery expectation• This is a pilot for two regions (EU & PNW), there are 13 in the

envisaged future deployment of WWA globally!• Per region WU and Data Issues

– Monthly release of >20k 2 month models covering current and future 2 months -> >60k workunits permanently deployed per month

– Each model generates about 200 - 350MB…• WWA is one of 7 on-going projects• Need more capacity (volunteers) if we are to continue with other

parrallel research projects!

Our laboratory: the world’s largest climate modelling facility

“The Virtual

Volunteer”

Using the Cloud to improve our climate modelling facility

• Utilise free tier to provide virtual volunteer (taking low priority runs initially)

• Cut down OS to minimise footprint• Configuring to produce full scan runs of all resource

types for benchmark

Using the Cloud to improve our climate modelling facility

Virtual Volunteers only supporting specific projects

• Move all computational resource into cloud for specific projects (If we were to see widespread volunteer movement away from useful systems)

14

An information environment for neuroscientists

What Neuroscientists would like to see:1. VRE – single point of contact2. A consistent annotation method for data archiving3. Web & shared filesystem based repository for data.

Many file formats to be supported4. A searchable data base for images5. A searchable data base for video images6. A document share tool for ‘live’ manuscript editing7. File space for literature sharing (PDFs)8. Blog area

More Specifically......Help Managing Data

Initial Experimental

Idea

Experimental Design

Data CollectionAnalysis

Publication

Challenges

• Interdisciplinary teams – different expectations, cultures, requirements

• Agreed standards– Different data formats

Microscopes (Multi-photon or Confocal) Live cell fluorescent imaging Electrophysiology recordings

– Meta data standards• Complexity of tools used in community• Ability to share images, data, analysis• Network connectivity not the best

Release 2.0

• Drupal – Frontend content management. Based on Drupal Commons,

• Alfresco – Backend data management. Modified Alfresco module,

• Apache Solr – Search engine,

• Apache Tika – Metadata extraction toolkit for documents,

• Google services – Docs and Calendar,

• Cloud-based computation using GPU’s,

• NCBO ontology-based tagging,

• LDAP – Single sign-on,

• Digital Pens – Used for recording experiments,

• XML-RPC desktop client – uploading and generating content.

Neurohub in the cloud

• Deployment of automated analysis services

UPLOAD

Image Processing Engine

IPE

RP

C C

all R

ES

ULT

S

ANALYSIS

Image Processing EngineImage Processing Engine

Image Processing Engine

DO

WN

LOA

D

Neurohub in the cloud

• Neurohub System Deployment– Current deployments

• Departmental physical server• Pro: data locality• Con: limited scalability and resilience, Collaborator access difficult

• Private Cloud• Pro: Increased resilience and scalability• Con: System visibility outside researcher control, Collaborator access difficult, difficult to grow adoption

Neurohub in the cloudNeurohubs in the cloud

Neurohub in the cloud

• Neurohub System Deployment– Current deployments

• Departmental physical server• Pro: data locality• Con: limited scalability and resilience, Collaborator access difficult

• Private Cloud• Pro: Increased resilience and scalability• Con: System visibility outside researcher control, Collaborator access difficult, difficult to grow adoption

– Public Cloud DeploymentPro: Published item deployable on demand, anywhere independent of researcher locationCon: May have legal/ethical restrictions on data hosting, (support load when your AMI becomes the next LIMS of choice in wet lab science?!)

25

Conclusions

• We have used AWS cloud within these and other projects successfully

• We are intending to grow with further projects and utilisation of AWS covering Volunteer, urgent and service computing models

– Bash the Bug– Ocean Sampling Day

• AWS Cloud is infrastructure and requires knowledge and support to setup and configure

• Cloud is not a magic bullet that will immediately solve all issues, may actually create new ones

• Ensure you are using the right tool for the right job

26

THANK YOU

&

QUESTIONS