Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019 · Current status Urika-XC...
Transcript of Cray Urika-XC Support of Analytics workflows on Shaheen · 1/31/2019 · Current status Urika-XC...
Cray Urika-XCSupport of Analytics
workflows on Shaheen
HPC 101 Shaheen II Training Workshop
Dr Samuel KortasComputational Scientist
KAUST Supercomputing [email protected]
31 January 2019
Outline
Current status What is Urika-XC Analytics slack? Some more detailed use Case. Q/A
Current statusUrika-XC Analytics slack has been installed since November 2018 and a few users already used it succesfully.
Urika-XC is a mature and stable environment provided by Cray.
Other modules are also directly available on Shaheen (eg tensorflow/1.8) and we are learning to build customized solution.
We are currently developping a web portal to ease the access to these resources
Live, regularly updated documentation is avalaible at http://hpc.kaust.edu.sa/analytics
Outline
Current status What is Urika-XC Analytics stack? Some more detailed use Case. Q/A
Idea: using Shaheen’s resource for other workflow than classical numerical simulation
© CRAY, courtesy of Dr James D. Maltby
The 2 components of Urika-XC
© CRAY, courtesy of Dr James D. Maltby
Urika-XC packages
© CRAY, courtesy of Dr James D. Maltby
Outline
Current status What is Urika-XC Analytics slack? Some more detailed use Cases.
● A wide variety of open source software available● A secured container technology● Setting a User Interface (Jupyter Notebook)
Software Available
© CRAY, courtesy of Dr James D. Maltby
The power of containerswith a magical Cray Sauce
© CRAY, courtesy of Dr James D. Maltby
Interactivesession
only!
The power of containerswith a magical Cray Sauce
UseregularConda!
© CRAY, courtesy of Dr James D. Maltby
Interactivesession
only!
The power of containerswith a magical Cray Sauce
Tuned byCray for
XC !
© CRAY, courtesy of Dr James D. Maltby
Interactivesession
only!
The power of containerswith a magical Cray Sauce
= Dockersecured
for HPC !
© CRAY, courtesy of Dr James D. Maltby
What is a container?
● Package Software into Standardized Units for Development, Shipment and Deployment
– Standard
– Lightweight
– Secure
● Docker Containers Are Everywhere: Linux, Windows, Data center, Cloud, Serverless, etc.
(definition given at http://docker.com)
● 10 year-old technology: LXC (linux containers), FreeBSD Jails, AIX Workload partition, Solaris Containers have been around but…
● Popularized with docker.com with a huge repository of images available…. With > 3.5 Millions dockerized applications at http://hub.docker.com
What is a container?
With containers, you are not trapped anymore by the OS and Software environment of Shaheen….
Download an existing container from hub.docker.com.
Modify it, develop on your workstation, laptop, and deploy immediately to Shaheen, IBEX, Amazon WS….
→ the overhead (in memory and performance) is very low compared to virtualization
But… Docker is not well suited for HPC world...
●Security issue: easy to become root, easy to browse any part of a filesystem
●Performance issue: Docker is hardwareagnostic… How to perform nice on an HPCsystems with a tuned network? How to detect with GPU, CPU you’re running on?
●License issue: Docker license or businessmodel not that clear makes it risky to baseany Open Science Project on it
Alternative solutions have been designed
Both Singularity (from LBL) and Shifter (from NERSC) can use Docker containers.
Although container can be tweaked on your machine with all permission, any attempt to become root or read unauthorized filesystem is denied from a container run via shifter
A part of Cray analytics software stack is a single Shifter container tuned by Cray to perform at its best on XC
How it works?Where it happens?
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
How it works?Where it happens?
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Urika-XCanalyticsavailablefrom...
Urika-XCanalyticsavailable
from...
So how it works?Which commands? Where it happens?
● Cray Urika-XC Analytics software stack is exclusively available from Shaheen gateway2. Every interaction operation has then to be launched from there.
ssh gateway2
● Once the gateway, the following modules are now available:– analytics: providing Spark, Anaconda python and R environment,
Tensorflow or Jupyter notebooks,– shifter: providing an HPC tuned and secure support of docker
images– cge: enabling Cray Graph Engine
WARNING!!!!● Gateway2 is an essential component of Shaheen II, used by
SLURM to launch and schedules its jobs….
● DON’T run any code on gateway2. Only use gateway2 to– Launch an interactive SLURM session with salloc– Launch training with run_training– Forward ports any user interface running on node– If building your own conda environment, prefer
conda create -p /project/userxxx/.conda/env/my_env
Shaheen’s node and file system environment...
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Shaheen’s node and file system environment...
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Shaheen’s node and file system environment...
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre/lustre/project
/lustre/scratch
Login nodes Gateway2
File system
Shaheen’s node and file system environment...
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre/lustre/project ← /project
/lustre/scratch ←/scratch
Login nodes Gateway2
File system
Shaheen’s node and file system environment...
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre/lustre/project ← /project
/lustre/scratch ←/scratch
Login nodes Gateway2
File system
Shaheen’s node and file system environment...
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre/lustre/project ← /project
/lustre/scratch ←/scratch
Login nodes Gateway2
File system
Shaheen’s node and file system environment...
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre/lustre/project ← /project
/lustre/scratch ←/scratch
Login nodes Gateway2
File system
Shaheen’s node and file system environment...
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre/lustre/project ← /project
/lustre/scratch ←/scratch/lustre/scratch/userxxx ←/home/userxxxx
Login nodes Gateway2
File system
Shaheen’s node and file system environment...
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre/lustre/project ← /project
/lustre/scratch ←/scratch/lustre/scratch/userxxx ←/home/userxxxx
Login nodes Gateway2
File system
Shaheen’s node and file system environment...
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre/lustre/project ← /project
/lustre/scratch ←/scratch/lustre/scratch/userxxx ←/home/userxxxx
Login nodes Gateway2
File system
Urika-XC stackis available froma gateway and a
computer node only
Urika-XC stackis available froma gateway and a
computer node onlyUrika-XC stackis available froma gateway and a
computer node only
Shaheen’s node and file system environment...
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre/lustre/project ← /project
/lustre/scratch ←/scratch/lustre/scratch/userxxx ←/home/userxxxx
Login nodes Gateway2
File system
Urika-XC stackis available froma gateway and a
computer node only
Urika-XC stackis available froma gateway and a
computer node only
Urika-XC stackis available froma gateway and a
computer node only
Every configuration files,conda environmentusually lives in your
/home directory
keep in mind, your homeis
/scratch/userxxxx !
You’re almost set!
● As your /home directory is in reality on /scratch, every file untouched during the last 60 days…
→ prefer building any valuable environment you wish to keep (conda environment, Notebooks...) in your project directory and make a symbolic link to your /scratch/$USER
mkdir -p /project/kxxxx/$USER/NOTEBOOKS
mkdir -p /project/kxxxx/$USER/.conda
cd /scratch/$USERln -s /project/kxxxx/$USER/NOTEBOOKS /project/kxxxx/$USER/.conda
You’re almost set!
● In order to use the tools in a seamless manner, you also need to set up a passwordless ssh connection between nodes scheduled.
→ set password less ssh private and public keys as explained at
https://www.hpc.kaust.edu.sa/analytics/ssh_keys
So how it works?Which commands? Where it happens?
● Go to gateway2 cdl1% ssh gateway2
● Load the modules: gateway2% module load analytics
● Book the nodes gateway2% salloc -N 3 (at least 3 are needed for spark, dask, tensorflow distributed)
● Start the environment gateway2% start_analytics
● Or do the last 2 steps in one
gateway2% salloc -N 3 start_analytics
So how it works?Which commands? Where it happens?
So how it works?Which commands? Where it happens?
So how it works?Which commands? Where it happens?
You’re all set!
● Spark, conda, R are all available and installed for you
● Do not hesitate to create your own conda environment and add conda packages if needed
conda create -p /scratch/userxxx/.conda/my_env
conda activate /scratch/userxxx/.conda/my_env
conda install pyspark
Why using R from Urika XC ?
Alreadytuned for XC
by CRAY© CRAY, courtesy of Dr James D. Maltby
Jupyter notebooks anddashboards made available
Interactivesession
only!© CRAY, courtesy of Dr James D. Maltby
+ Jupyterlab
How to reach your User Interface Jupyter? Tensorboard?
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Only cdls are visible fromyour laptop
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Only cdls are visible fromyour laptop
How toreach there?
How to reach your User Interface Jupyter? Tensorboard?
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Only cdls are visible fromyour laptop
How toreach there?
How to reach your UserInterface Jupyter? Tensorboard?
8888
start_analytics --ssh-tunnel 8080:8080
8888
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Only cdls are visible fromyour laptop
How toreach there?
How to reach your UserInterface Jupyter? Tensorboard?
8888
8888
8888
8888`
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Only cdls are visible fromyour laptop
How toreach there?
How to reach your UserInterface Jupyter? Tensorboard?
8888
8888
8888
8888`
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Only cdls are visible fromyour laptop
How toreach there?
How to reach your UserInterface Jupyter? Tensorboard?
8888
8888
8888
8888`
ssh – L 8888:localhost:8888 cdl2ssh – L 8888:localhost:8888 gateway2module load analyticssalloc -N 1 -p debugstart_analytics --login-port 8888 --ui-port 8888jupyter notebook
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Only cdls are visible fromyour laptop
How toreach there?
Choose one forwarded port per user
20080
20080
20080
20080
ssh – L 20080:localhost:20080 cdl2ssh – L 20080:localhost:20080 gateway2module load analyticssalloc -N 1 -p debugstart_analytics --login-port 20080 --ui-port 20080jupyter notebook
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Only cdls are visible fromyour laptop
How toreach there?
Save the forwarding in .ssh/config
20080
20080
20080
20080
ssh – L 20080:localhost:20080 cdl2ssh – L 20080:localhost:20080 gateway2module load analyticssalloc -N 1 -p debugstart_analytics --login-port 20080 --ui-port 20080jupyter notebook
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Port forwarding setin .ssh/config
How toreach there?
Save the forwarding in .ssh/config
20080
20080
20080
20080
ssh shaheenssh gateway2module load analyticssalloc -N 1 -p debugstart_analytics --login-port 20080 --ui-port 20080jupyter notebook
Compute Nodes
/home
cdl1
cdl2
cdl3
cdl4Gateway2
Gateway1
/lustre
Login nodes Gateway2
File system
Port forwarding setin .ssh/config
How toreach there?
Save the forwarding in .ssh/config
20080
20080
20080
20080
ssh cdl2ssh gateway2module load analyticssalloc -N 1 -p debugstart_analytics --login-port 20080 --ui-port 20080jupyter notebook
On myLaptop
On cdl3
https://www.hpc.kaust.edu.sa/jupyter
Interactivesession
only!© CRAY, courtesy of Dr James D. Maltby
+ Jupyterlab
Cray Graph Engine
© CRAY, courtesy of Dr James D. Maltby
So how it works?Which commands? Where it happens?
● If you aim to use the web interface, you need to build a tunnel just like for Jupyter notebook
● Full instructions are detailed at
https://www.hpc.kaust.edu.sa/cray-graph-engine
Use jupyterlab instead of jupyter notebook
conda install jupyterlab
Questions?
http://hpc.kaust.edu.sa/analytics
https://pubs.cray.com/content/S-2589/1.1.UP00/xctm-series-urika-xc-analytic-
applications-guide/about-urika-xc