UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara...

21
Natalie Danezi <[email protected]> Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life Sciences Workshop, 17 Oct 2016

Transcript of UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi Large scale computing & Big Data SURFsara...

Page 1: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Natalie Danezi <[email protected]>

Large scale computing & Big DataSURFsara e-infrastructures

Swammerdam Institute for Life Sciences Workshop, 17 Oct 2016

Page 2: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

SURF family

Shared Professional and Educational Services

Scientific Computing & Storage

Commercial ICT Products & Services

National Research & Education Network

eScience Collaboration and Tools

2

Page 3: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

High Performance Computing(HPC)

in research

3

Page 4: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

Does my research fit in HPC?

• Faster results• Task repetition• Higher accuracy• Larger computational domains• Larger volume of data

4

http://www.advancedgwt.com/groundwater-software/data-management-and-visualization/groundwater-desktop.html

Page 5: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop5

The HPC Infrastructure

Page 6: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

More users, more resources

6

Animation: EGI - SURFsara MOOC 2014

http://web.grid.sara.nl/mooc/animations/single_user.html

Page 7: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

Lisa cluster: batch processing

7

Animation: EGI - SURFsara MOOC 2014

http://web.grid.sara.nl/mooc/animations/cluster.html

Page 8: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

Lisa cluster: batch processing

8

• Calculation intensive applications• Not data intensive applications• Well supported software stack• Relatively easy to start

Page 9: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

Life Science Grid: cluster of clusters

9

Animation: EGI - SURFsara MOOC 2014

http://web.grid.sara.nl/mooc/animations/wms.html

Page 10: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

• Resources meant for life science researchers

• 11 local clusters (AMC, LUMC, WUR, TUD, RUG, ..)

• Capacity: +/- 12 000 cpu cores, peta bytes of storage

• Independent tasks: Parameter sweeps, Monte-Carlo, ..

• Linux experience required

10

Life Science Grid: cluster of clusters

Page 11: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

Life Science Grid examples

11

BBMRI.nl: a collaborative Dutch project focusing on BioBank enrichment The BIOS project • 6 BioBanks • 4000+ samples • 3 measurement types • 30 TB data

BBMRI.nl BIOS project

BBMRI.nl

•  e-Infrastructure for NMR and structural biology

• One of the largest bio* virtual organizations

•  Provides access to grid resources through easy to use web pages

We-NMR

BIOS e-Infrastructure

Local storage ErasmusMC

Storage LSG Cluster ErasmusMC

Grid storage

Dual tape copies: Amsterdam & Almere

Data processing: Grid & Cloud

Lightpath connectivity

Biomedical projects @SURF

•  CTMM$Transla*onal$Research$IT$(TraIT):$develop$a$long8las*ng$IT$infrastructure$transla*onal$research$

$•  BBMRI$will$form$an$interface$between$biological$specimens$and$data$(from$pa*ents$and$European$popula*ons)$and$top8level$biological$and$medical$research.$

•  ALS$project$MinE$analysing$and$sharing$data$of$large$cohort$studies$to$discover$gene*c$profiles$

$•  Na*onal$project$Data4LifeSciences$

•  European$Life$Sciences$Infrastructure$For$Biological$Informa*on$(ELIXIR8NL:$DTL)$$

9

Large Hadron Collider LOFAR GoNL

Page 12: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop12

Turbulance modelling

Protein structure

eSALSA

Cartesius Supercomputer

• Large memory, fast interconnect, fast and large I/O• European collaborations: PRACE, HPC-Europa• Climate models, cell simulations• Programming experience required

Page 13: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

Hadoop How to index the web

13

• Big Data• Exploration/mining of data• Map / Reduce• Programming experience required

Page 14: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

HPC Cloud

14

• Flexible & controllable• Microsoft Windows supported• Ideal for 3rd party application

providers• Graphical user interface• User is also the system admin

Page 15: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

Data services

15

Mastering the data life cycle with e-infrastructure services

Analyzing)data)

Preserving)data)

Crea2ng)data)

Processing))data)

Giving)access)to)data)

Reusing)data)

011011101

Central(Archive,(

Grid(dCache(Storage,(

B2SAFE/B2SHARE((EUDAT),(

Persistent(IdenCfier(services((EPIC)(

Research(data(NL(

Data(Ingest(Service,((

Lightpaths,(

Normal(channels((sIp,(nfs,(hKp)(

AuthenCcaCon(

AuthorizaCon(

CollaboraCon(tools:((

Beehub,(FileSender(

Supercomputer,((

Lisa(Cluster(

GRID,(((

HPC(Cloud,((

Hadoop(

VisualisaCon(services:(

Collaboratorium,(

(GPU(cluster,((

mobile(setup(

011011101

Enter(a(new((

Cycle,(develop(a(workplan(and(

apply(ICT(soluCons(

NLeScience(center(

integraCon(support((

SURFmarket(

licences/brokering(

6

Page 16: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop16

My laptop is not enough: where to go?

Page 17: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

SURFsara services

• Lisa national compute cluster• Life Science Grid: interconnected clusters across the

Netherlands• Cartesius national Supercomputer• Hanthi Hadoop cluster• Oort HPC Cloud cluster• Central Archive, Beehub, SURFdrive for Data Services• …, Visualisation, Networking, Consultancy, Innovation

17

Page 18: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

Getting access

18

• For:

๏ Lisa National Cluster, Cartesius Supercomputer

• Apply via:

๏ IRIS

(NWO grant)

• For:๏ Grid, Hadoop, HPC Cloud, Data Services,

Visualisation

๏ Or, not sure what suits you best?

• Apply via:

๏ https://e-infra.surfsara.nl/

(SURF grant)

Page 19: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

Standard support

19

Bring your scientific problem

• We provide advice and support: ๏ Getting access ๏ Best practices ๏ Design & optimisation ๏ Integration to large scale

Page 20: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop

Trainings, online tutorials

20

Page 21: UVA SILS 20161017 - mkuzak.github.io · Natalie Danezi  Large scale computing & Big Data SURFsara e-infrastructures Swammerdam Institute for Life

Swammerdam Institute for Life Sciences Workshop21

[email protected]

020 800 1400

Questions?

https://www.surf.nl/en/about-surf/subsidiaries/surfsara/