Science Cloud Paul Watson Newcastle University, UK [email protected].

41
Science Cloud Paul Watson Newcastle University, UK [email protected]

Transcript of Science Cloud Paul Watson Newcastle University, UK [email protected].

Page 1: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Science Cloud

Paul WatsonNewcastle University, UK

[email protected]

Page 2: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Research Challenge

Understanding the brain is the greatest informatics challenge

• Enormous implications for science:

• Medicine

• Biology

• Computer Science

Page 3: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Collecting the Evidence

100,000 neuroscientists generate huge quantities of data – molecular (genomic/proteomic)– neurophysiological (time-series activity)– anatomical (spatial)– behavioural

Page 4: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Neuroinformatics Problems

• Data is:• expensive to collect but rarely shared• in proprietary formats & locally described

• The result is:• a shortage of analysis techniques that can be applied

across neuronal systems• limited interaction between research centres with

complementary expertise

Page 5: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Data in Science

• Bowker’s “Standard Scientific Model”

1. Collect data

2. Publish papers

3. Gradually loose the original data

The New Knowledge Economy & Science & Technology Policy, G.C. Bowker

• Problems:– papers often draw conclusions from data that is not

published– inability to replicate experiments– data cannot be re-used

Page 6: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Codes in Science

• Three stages for codes

1. Write code and apply to data

2. Publish papers

3. Gradually loose the original codes

• Problems:– papers often draw conclusions from codes that are

not published– inability to replicate experiments– codes cannot be re-used

Page 7: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Plan

• Neuroinformatics - a challenging e-science application• CARMEN – addressing the challenges• Cloud Computing for e-science

– Lessons we’ve Learnt• The Promise of Commercial Clouds

Page 8: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

cracking the neural code

neurone 1

neurone 2

neurone 3

raw voltage signal data typically collected using single or multi-electrode array recording

Focus on Neural Activity

Page 9: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Epilepsy Exemplar

Data analysis guides surgeon during operation

Further analysis provides evidence

WARNING!The next 2 Slides show an exposed human brain

Page 10: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.
Page 11: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.
Page 12: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

CARMEN

enables sharing and collaborative exploitation of data, analysis code and expertise that are not physically collocated

Page 13: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

CARMEN Project

Stirling

St. Andrews

Newcastle

York

Sheffield

Cambridge

ImperialPlymouth

Warwick

Leicester

Manchester

UK EPSRC e-Science Pilot

$7M (2006-10)

20 Investigators

Page 14: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Industry & Associates

Page 15: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

CARMEN e-Science Requirements

• Store– very large quantities of data (100TB+)

• Analyse– suite of neuroinformatics services– support data intensive analysis

• Automate– workflow

• Share– under user-control

Page 16: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Background: North East Regional e-Science Centre

• 25 Research Projects across many domains:• Bioinformatics, Ageing & Health, Neuroscience, Chemical

Engineering, Transport, Geomatics, Video Archives, Artistic Performance Analysis, Computer Performance Analysis,....

• Same key needs:

Store

Analyse

AutomateShare

Page 17: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Result: e-Science Central

• Integrated Store-Analyse-Automate-Share infrastructure• Web-based• Generic

– CARMEN neuroinformatics & chemistry as pilots

Page 18: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Science Cloud Architecture

Data storage

and

analysis

Access over Internet

(typically via browser)

Upload data &

services

Run analyses

Page 19: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Cloud Services Continuum (based on Robert Anderson)

Platform(PaaS)

Infrastructure(IaaS)

Software(SaaS)

Google Apps

Google AppEngine

Amazon EC2 & S3

http://et.cairene.net/2008/07/03/cloud-services-continuum/

Microsoft Azure

Salesforce.com

Page 20: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Science Cloud Options

Cloud Infrastructure:Storage & Compute

Scie

nce

Ap

p 1

....

Scie

nce

Ap

p n

Cloud Infrastructure: Storage & Compute

Science Platform

ScienceApp 1 .... Science

App n

Users

Service Developers

Page 21: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

CARMEN Cloud

Filestore with PatternSearch

Database

Metadata

ServiceRepositoryProcessing

Workflow

Enactment

Workflo

w

Secu

rit

y

Browsers &

Rich Clients

Page 22: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Editing and Running a Workflow on the Web

Page 23: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Viewing the output of Workflow Runs

Workflow

Result File

Page 24: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Viewing results

Page 25: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Blogs and links

Communicating Results

Linking to results & workflows

Page 26: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

What we learnt: Moving into a Cloud

• Moving existing technologies into a cloud can be difficult– some can’t run in a Cloud at all

Page 27: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Raw Data Exploration with Signal Data Explorer

Page 28: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

What we learnt : Scalability

• Clouds offer the potential for scalability– grab compute power only when needed

• But developers have to write scalable code– for Infrastructure as a Service Clouds

Page 29: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Dynasoar: Dynamic Deployment

29

C WSP

req

res

1

Host Provider

node 1s2, s5

node 2

node ns2

Web Service Provider

3

2: service fetch &deploy

SR

Service Repository

R

The deployed service remains in place andcan be re-used - unlike job scheduling

A request to s4

Page 30: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Dynasoar

30

C WSP

req

res

Host Provider

node 1s2, s5

node 2

node ns2

Web Service Provider

Consumer

A request for s2 is routed to an existing

deployment of the service

Page 31: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Adaptive Dynamic Deployment with Dynasoar

0

50

100

150

200

250

300

350

400

450

0.03

0.03

0.03

0.06

0.06

0.13

0.13

0.13

0.25

0.25 0.

5

0.5

0.5 1 1 1

Arrival Rate (messages per second)

Res

pons

e tim

e (s

econ

ds)

0

2

4

6

8

10

12

14

16

18

Proc

esso

rs in

poo

l

Response time(Seconds)

processors in pool

Adding Processors as you need them optimises resources and saves money in pay-as-you-go clouds

Commercial Pay-as-you-go cloudsWould allow us to avoid this limit

Page 32: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Hot Off the Press..

• Recent experiments with Microsoft Azure Cloud– running Chemical analyses– Silverlight UI

Thanks to:

- Paul Appleby & Team at the Microsoft Technology Centre, Reading

- & MS e-Science Group

Page 33: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.
Page 34: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.
Page 35: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Microsoft Azure Cloud for e-Science Demo

Page 36: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Why are Commercial Clouds Important: Before

Research

1. Have good idea

2. Write proposal

3. Wait 6 months

4. If successful, wait 3 months

5. Install Computers

6. Start Work

Science Start-ups

1. Have good idea

2. Write Business Plan

3. Ask VCs to fund

4. If successful..

5. Install Computers

6. Start Work

Page 37: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Why Use Commercial Clouds:

1. Have good idea

2. Grab nodes from Cloud provider

3. Start Work

4. Pay for what you used

• also scalability, cost, sustainability

Page 38: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Commercial Clouds to the Rescue?

• Focus currently on infrastructure as a service

• But, this is only part of the stack

• Can we have pay-as-you-go Science Cloud Platforms?

Page 39: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

A Sustainable Science Cloud

Science Platform as a Service

ScienceApp 1

.... ScienceApp n

CommercialClouds

?

?

Problem:deliveringthe e-science platform

www.inkspotscience.com

e-Science Central

Cloud Infrastructure: Storage & Compute

Page 40: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Summary: e-Science Central & CARMEN

Software as a Service

Cloud Computi

ng

Social Networki

ng

e-Science Central /CARMEN

• Dynamic Resource

Allocation• Pay-as-you-Go*

• Web based• Works anywhere

• Controlled Sharing

• Collaboration• Communities

Page 41: Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk.

Summary

• e-Science Central– Store-Analyse-Automate-Share e-science platform– Adding content from a range of domains

• CARMEN is piloting this approach for neuroinformatics

• Cloud computing can revolutionise e-science– reduce time from idea to realisation