Collaborative data-driven...

Post on 22-Jun-2020

3 views 0 download

Transcript of Collaborative data-driven...

Collaborative data-driven science

Collaborative data-driven science

Mike Rippin

Collaborative data-driven science

Background and History of SciServer

Major Objectives

Current System

SciServer Compute – Now

SciServer Compute – Future

Q&A

2

Collaborative data-driven science

Collaborative data-driven science

Collaborative data-driven science

4

“The Project aims to create a sustainable collaborative ecosystem built around several large scientific data sets for the broader science community, based upon the expertise developed for the Sloan Digital Sky Survey (SDSS) SkyServer and associated projects.”

Collaborative data-driven science

5

NSF Cooperative Agreement

5 years duration, just completed first 3

Development of Cyberinfrastructure

Science Driven

Collaborative data-driven science

6

Alex Szalay - PI Mike Rippin – PM Ani Thakar, Gerard Lemson, Jordan Raddick,

Bonnie Souter – Associate Directors (Team Leads)

Technical Team: Dmitry Medvedev, Manuchehr Taghizadeh-Popp, Jai Won Kim, Sue Werner, Victor Paul, Jan Vandenberg, Lance Joseph, Alainna White, Laszlo Dobos

Collaborative data-driven science

Started with the SDSS SkyServer

Goal: instant access to rich content

Idea: bring the analysis to the data

Interactive access at the core

7

Collaborative data-driven science

Interactive science on petascale data

Create scalable open numerical laboratories

Large footprint across many disciplines

Use commonly shared building blocks

Major national and international impact

Ani Thakar, JHU 8

Collaborative data-driven science

Collaborative data-driven science

Collaborative data-driven science

10

Cyber Infrastructure

Science Collaboration

SDSS Integration

Outreach & Education

Collaborative data-driven science

11

Cyber Infrastructure

Collaborative data-driven science

12

Database storage & Query:

Data analysis:

Data exploration:

User sign-on:

File storage:

Collaborative data-driven science

13

Collaborative data-driven science

14

Collaborative data-driven science

15

Hosted Data

Personal Data

Single Sign-On

Qu

ery

Co

mp

ute

Cyber Infrastructure

Collaborative data-driven science

16

Hosted Data

Cyber Infrastructure

Astronomy Cosmology

Turbulence Genomics

Materials Science Oceanography

Collaborative data-driven science

17

Co

mp

ute

Cyber Infrastructure

Server Cluster

Collaborative data-driven science

18

Co

mp

ute

Cyber Infrastructure

Server Cluster

VM VM

Collaborative data-driven science

19

Co

mp

ute

Cyber Infrastructure

VM

Docker Docker Docker

Collaborative data-driven science

20

Co

mp

ute

Cyber Infrastructure

Docker

Jupyter

Collaborative data-driven science

21

Co

mp

ute

Cyber Infrastructure

Docker

Jupyter

INTERACTIVE & SYNCHRONOUS

Collaborative data-driven science

22

Engine for executing analysis on data sets Environment for executing Python Notebooks

Interactively Utility API Libraries in Python and R Interacts with ALL other SciServer

components that have a WS API:◦ Login Portal for authentication◦ CASJobs for Queries◦ SkyServer and SkyQuery for Astronomy data◦ SciDrive for Storage

Collaborative data-driven science

23

Collaborative data-driven science

24

Collaborative data-driven science

25

Collaborative data-driven science

26

Collaborative data-driven science

27

Collaborative data-driven science

28

Collaborative data-driven science

Collaborative data-driven science

Collaborative data-driven science

30

HPC

CloudStore

DBSkyServer

Compute

SciDrive

indexing

cycles/byte

SciServer

Collaborative data-driven science

31

Build on VM/Docker Architecture

Scalable non-interactive, asynchronous Job management (JOBM)

Rich Access Controls (RACM)

Distributed compute execution (COMPM)

Support Python, R, Matlab

Collaborative data-driven science

32

Dashboard UI

SciDrivePlugins

CASJobs

Resource and Access Control Manager

Compute Manager

D D

D D

D D D

Server Cluster

Compute Manager

D D

D D

D D D

Server Cluster

Job Manager

Job List MetadataACLs

Resources

Script Job

Script Job

Query Job

PUSH

PUSH

PUSHPULL

PULLAsync

Async

Jupyter

Collaborative data-driven science

33

Dashboard UI

SciDrivePlugins

CASJobs

Resource and Access Control Manager

Compute Manager

D D

D D

D D D

Server Cluster

Compute Manager

D D

D D

D D D

Server Cluster

Job Manager

Job List MetadataACLs

Resources

Script Job

Script Job

Query Job

PUSH

PUSH

PUSHPULL

PULLAsync

Async

Jupyter

Collaborative data-driven science

34

Dashboard UI

SciDrivePlugins

CASJobs

Resource and Access Control Manager

Compute Manager

D D

D D

D D D

Server Cluster

Compute Manager

D D

D D

D D D

Server Cluster

Job Manager

Job List MetadataACLs

Resources

Script Job

Script Job

Query Job

PUSH

PUSH

PUSHPULL

PULLAsync

Async

Jupyter

Collaborative data-driven science

35

Dashboard UI

SciDrivePlugins

CASJobs

Resource and Access Control Manager

Compute Manager

D D

D D

D D D

Server Cluster

Compute Manager

D D

D D

D D D

Server Cluster

Job Manager

Job List MetadataACLs

Resources

Script Job

Script Job

Query Job

PUSH

PUSH

PUSHPULL

PULLAsync

Async

Jupyter

Jupyter Notebook (INVISIBLE)

<code>{

CJ.Query()SD.Write()Open.File()

}

CASJobs Persistent File SciDrive

MyDB ScratchDBScratch File SDSS Turbulence MatSci

EXTERNAL

D

Collaborative data-driven science

36

SciServer Compute Interactive is live now

Supports Python, R, Jupyter Runs on a 4 node cluster Access to several domain databases

Asynchronous Job Execution early 2017

Please register with SciServer and try it out

Collaborative data-driven science

Collaborative data-driven science