Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds...

20
Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds www.arc.leeds.ac.uk

Transcript of Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds...

Page 1: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Containers and HPC:Singularity at Leeds

Martin CallaghanAdvanced Research Computing, University of Leedswww.arc.leeds.ac.uk

Page 2: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

IntroductionI’m Martin Callaghan

Research Computing Consultant at the University of Leeds

Page 3: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

This talk● Introductions and ethos● Our journey to containerisation● Supporting users● Our approach to building containers● Case Study 1: Deep Learning and GPUs● Case Study 2: WRF and MPI● Case Study 3: R and ‘strange’ dependencies● Conclusion and Questions

Page 4: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

HPC and Research Computing at LeedsSmall team (just 4 of us)

5 clusters (~15000 cores, GPUs, Xeon Phis)

~700 active research users

Traditional HPC users

New users in Biological Sciences, Business & Economics, Humanities, Data Analytics

Need to remain agile and responsive to changing research needs

Page 5: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Our journey towards containersPressure from new (and existing) users:

● New codes & applications● Interactivity● Visualisation

Pressure from technology:

● Cloud● Accelerators● New languages and paradigms

Page 6: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Managing applications

Environment Modules

Package Management

User codes

Containers

Where do containers fit in our current model of supplying and supporting applications?

Page 7: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

User support and trainingHPC and Research Computing isn’t about doing things

FOR or TO

It’s about doing things

WITH

Support, Training, Consultancy and Outreach go a long way towards promoting the ethos of WITH and developing trusting partnerships.

Page 8: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

ARC training

Introduction to HPC

Introduction to Application

Development

HPC Architectures and Parallel

Programming

Introduction to MPI and MPI/IO

Data Carpentry

Software Carpentry

Introduction to Linux

Containers for Research

* HPC and Data Analysis

Version control with Git and Github

High Performance Python

Scientific Python

* Software Development

practices for research

http://arc.leeds.ac.uk/training

Scientific Visualisation

ARC/HPC Training Portfolio

* in development

GPU Programming

Page 9: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

What’s a container?Containers allow us to:

● Wrap up a piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries all on top of your favourite Linux distro.

● Almost a Lightweight VM, but containers share the kernel with other containers and processes, running as isolated processes in user space.

● A good way of providing a reproducible environment.

Page 10: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

What problems can they help solve?● Dependency Hell● Imprecise Documentation● Barriers to adoption and reuse● Code rot: esp. outdated software dependencies● Users who need a ‘special’ and quickly…● Increasingly, software is available prebuilt in a container (OpenFoam, WRF)● Sharing stacks

Could use…

VMs, Workflow tools, Package Managers

Page 11: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

The case for containers in research

Container technologies can support reproducible computational research by allowing a complete research environment to be scripted and shared.

Page 12: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Our approach to building containersLayers and templates: Building Blocks

Script everything: no writable containers

Let users do as much as possible (we supply a VM)

No VM or root access?

Page 13: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Singularity workflow

Page 14: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Leverage the available toolsIntegrate Github and Singularity for a CI workflow:

Create definition file

PUSH WEBHOOK

PULL

Singularity Hub:BUILD and TEST

Page 15: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Case Study 1: Deep Learning and GPUsNew researchers with new needs:

● Theano● Tensorflow● Keras● Python● Jupyter Notebooks

○ Interactive use○ Batch use

Page 16: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Case Study 1: SolutionSingle container with complete stack

Will run Jupyter Notebook

Some (clunky) SSH tunnelling and users can access a JN server running on a GPU compute node from their local browser.

Page 17: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Case Study 2: WRF● WRF= Weather Research and Forecasting

○ A popular climate modelling application○ Can be a little awkward to install- lots of dependencies

● This was a good opportunity to test out Singularity’s MPI compatibility (we used OpenMPI).

● It works● No appreciable overheads to using the containerised model over regular build

Page 18: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Case Study 3: R and strange dependenciesFork of R

User familiarity with a single Linux distribution

Cascading and complex dependencies

R container runs as application under SGE as task array to calculate cycling routes.

Page 19: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Other interesting possibilitiesApache Spark

NoSQL databases

Windows applications running under WINE

Checkpointing containers with DMTCP

Just using containers for some applications, eg. OpenFOAM

Page 20: Containers and HPC: Singularity at Leeds - GitHub Pages · Containers and HPC: Singularity at Leeds Martin Callaghan Advanced Research Computing, University of Leeds ... High Performance

Any questions?