Post on 22-Jun-2020
Collaborative data-driven science
Collaborative data-driven science
Mike Rippin
Collaborative data-driven science
Background and History of SciServer
Major Objectives
Current System
SciServer Compute – Now
SciServer Compute – Future
Q&A
2
Collaborative data-driven science
Collaborative data-driven science
Collaborative data-driven science
4
“The Project aims to create a sustainable collaborative ecosystem built around several large scientific data sets for the broader science community, based upon the expertise developed for the Sloan Digital Sky Survey (SDSS) SkyServer and associated projects.”
Collaborative data-driven science
5
NSF Cooperative Agreement
5 years duration, just completed first 3
Development of Cyberinfrastructure
Science Driven
Collaborative data-driven science
6
Alex Szalay - PI Mike Rippin – PM Ani Thakar, Gerard Lemson, Jordan Raddick,
Bonnie Souter – Associate Directors (Team Leads)
Technical Team: Dmitry Medvedev, Manuchehr Taghizadeh-Popp, Jai Won Kim, Sue Werner, Victor Paul, Jan Vandenberg, Lance Joseph, Alainna White, Laszlo Dobos
Collaborative data-driven science
Started with the SDSS SkyServer
Goal: instant access to rich content
Idea: bring the analysis to the data
Interactive access at the core
7
Collaborative data-driven science
Interactive science on petascale data
Create scalable open numerical laboratories
Large footprint across many disciplines
Use commonly shared building blocks
Major national and international impact
Ani Thakar, JHU 8
Collaborative data-driven science
Collaborative data-driven science
Collaborative data-driven science
10
Cyber Infrastructure
Science Collaboration
SDSS Integration
Outreach & Education
Collaborative data-driven science
11
Cyber Infrastructure
Collaborative data-driven science
12
Database storage & Query:
Data analysis:
Data exploration:
User sign-on:
File storage:
Collaborative data-driven science
13
Collaborative data-driven science
14
Collaborative data-driven science
15
Hosted Data
Personal Data
Single Sign-On
Qu
ery
Co
mp
ute
Cyber Infrastructure
Collaborative data-driven science
16
Hosted Data
Cyber Infrastructure
Astronomy Cosmology
Turbulence Genomics
Materials Science Oceanography
Collaborative data-driven science
17
Co
mp
ute
Cyber Infrastructure
Server Cluster
Collaborative data-driven science
18
Co
mp
ute
Cyber Infrastructure
Server Cluster
VM VM
Collaborative data-driven science
19
Co
mp
ute
Cyber Infrastructure
VM
Docker Docker Docker
Collaborative data-driven science
20
Co
mp
ute
Cyber Infrastructure
Docker
Jupyter
Collaborative data-driven science
21
Co
mp
ute
Cyber Infrastructure
Docker
Jupyter
INTERACTIVE & SYNCHRONOUS
Collaborative data-driven science
22
Engine for executing analysis on data sets Environment for executing Python Notebooks
Interactively Utility API Libraries in Python and R Interacts with ALL other SciServer
components that have a WS API:◦ Login Portal for authentication◦ CASJobs for Queries◦ SkyServer and SkyQuery for Astronomy data◦ SciDrive for Storage
Collaborative data-driven science
23
Collaborative data-driven science
24
Collaborative data-driven science
25
Collaborative data-driven science
26
Collaborative data-driven science
27
Collaborative data-driven science
28
Collaborative data-driven science
Collaborative data-driven science
Collaborative data-driven science
30
HPC
CloudStore
DBSkyServer
Compute
SciDrive
indexing
cycles/byte
SciServer
Collaborative data-driven science
31
Build on VM/Docker Architecture
Scalable non-interactive, asynchronous Job management (JOBM)
Rich Access Controls (RACM)
Distributed compute execution (COMPM)
Support Python, R, Matlab
Collaborative data-driven science
32
Dashboard UI
SciDrivePlugins
CASJobs
Resource and Access Control Manager
Compute Manager
D D
D D
D D D
Server Cluster
Compute Manager
D D
D D
D D D
Server Cluster
Job Manager
Job List MetadataACLs
Resources
Script Job
Script Job
Query Job
PUSH
PUSH
PUSHPULL
PULLAsync
Async
Jupyter
Collaborative data-driven science
33
Dashboard UI
SciDrivePlugins
CASJobs
Resource and Access Control Manager
Compute Manager
D D
D D
D D D
Server Cluster
Compute Manager
D D
D D
D D D
Server Cluster
Job Manager
Job List MetadataACLs
Resources
Script Job
Script Job
Query Job
PUSH
PUSH
PUSHPULL
PULLAsync
Async
Jupyter
Collaborative data-driven science
34
Dashboard UI
SciDrivePlugins
CASJobs
Resource and Access Control Manager
Compute Manager
D D
D D
D D D
Server Cluster
Compute Manager
D D
D D
D D D
Server Cluster
Job Manager
Job List MetadataACLs
Resources
Script Job
Script Job
Query Job
PUSH
PUSH
PUSHPULL
PULLAsync
Async
Jupyter
Collaborative data-driven science
35
Dashboard UI
SciDrivePlugins
CASJobs
Resource and Access Control Manager
Compute Manager
D D
D D
D D D
Server Cluster
Compute Manager
D D
D D
D D D
Server Cluster
Job Manager
Job List MetadataACLs
Resources
Script Job
Script Job
Query Job
PUSH
PUSH
PUSHPULL
PULLAsync
Async
Jupyter
Jupyter Notebook (INVISIBLE)
<code>{
CJ.Query()SD.Write()Open.File()
}
CASJobs Persistent File SciDrive
MyDB ScratchDBScratch File SDSS Turbulence MatSci
EXTERNAL
D
Collaborative data-driven science
36
SciServer Compute Interactive is live now
Supports Python, R, Jupyter Runs on a 4 node cluster Access to several domain databases
Asynchronous Job Execution early 2017
Please register with SciServer and try it out
Collaborative data-driven science
Collaborative data-driven science