Distributed Development, Centralised Delivery - SAGrid Jenkins + CVMFS

26
Jenkins + CVMFS : Distributed Development, Centralised Delivery Bruce Becker | [email protected] Coordinator: SAGrid SANREN, Meraka Institute, CSIR Stefanus Riekert | [email protected] HPC Application Engineer University of the Free State

description

presentation on the status of the SAGrid application porting platform based on Jenkins and CVMFS, given to the EGI Community Forum 2014

Transcript of Distributed Development, Centralised Delivery - SAGrid Jenkins + CVMFS

Jenkins + CVMFS :Distributed Development,Centralised Delivery

Bruce Becker | [email protected]: SAGrid

SANREN, Meraka Institute, CSIR

Stefanus Riekert | [email protected] Application Engineer

University of the Free State

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Outline● What users want● SAGrid VO – a catch-all VO with many applications● Problem statements:

● Problem 1: ”the usual problem” – maintaining applications in a distributed computing environment

● Problem 2: ”Another usual problem” - maintaining a complex application inventory

● General solution : CVMFS + Jenkins● Some specifics of SAGrid CI platform ● Outlook

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

SAGrid as a catch-all VO

● The South African National Grid operates a catch-all VO which all South African researchers can use to access computing and data resources.

● SAGrid VO is not a domain-specific VO, so● several widely-varying uses for the applications

supported by this VO● Applications requested by users or communities

themselves

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

What users want

Amazing infrastructure

Some users want highly varied, modular

application selection

Vertically integratedHighly specialised

applications

Highly trained supportHighly trained support

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

What users get sometimes

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

The problem (1) - ”the usual problem”

● Software distribution was done mostly by hand”:● Someone from the ops team develops script to install the application● Apps installed via job submission ● Tags applied via script or by the job itself

● Issues:● Major overhead of work● Inconsistent installation procedures between applications and sites● Bottleneck in porting applications (has to be done by someone in the

VO)● Duplication of effort, especially in dependencies of applications● Difficult to manage application lifecycles

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

The problem (2) - what about the community ?

● Managing the inventory in a catch-all VO can be complex when there are many applications

● Prioritising porting requests depends on the knowledge of the export porting the application● Can lead to major delays in porting and deploying applications

● However, a user or community usually has an expert who knows how to tune, port and configure the application properly, as well as dependencies● Usually, ”they” have to conform to ”us” - learn grid tools and

terminology, etc

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Problem (3) :Changes to the playing field

● New middleware stacks

● New architectures – GPGPU, ARM

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Questions to answer● How do we lower the barrier to entry to the grid or

cloud infrastructure ?● How can the application expert prove to the resource

provider that the application will actually run on the execution environment of the site ?

● How can we manage the lifecycle of applications across multiple versions, architectures, configurations ?

● How can we ensure that once applications are ”certified”, they are actually available on as many sites as possible ?

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

General Solution: Jenkins + CVMFS

● The issues outlined are ”typical” in a large software project

● Usually solved by judicious use of Continuous Integration system

● Once applications have been ”ported”, put them into a trusted repository

● Previously – built RPMs, but required site-admin intervention

● One-time configuration with CVMFS

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

First, some changes● Distribute the effort, centralise the tools

● Move repository from ”closed” SVN repo– https://ops.sagrid.ac.za/trac/svn/repo

● to git– https://github.com/SAGridOps/SoftwareInstallation

● Don't have to give write access to a single repo, instead accept pull requests

● Take advantage of all the Github infrastructure● Expand possible contributors to those ”outside” the

infrastructure● Recognise individuals' contribution

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Recognise individuals...

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Decentralise the team

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Collaborate with code

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Let the robots do the work

● Define what we want to deploy – let the experts take care of how to deploy

● DevOps paradigm – same review/tag/release mechanisms on operations code as we have for scientific applications● Teach a marketable skill● Allow specialisation● Enable remote management of complex services● Ensure that published methodology is adopted

methodology

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Quality Control and feedback

● Ensure that requested applications are included in the repo

● Provide testing and QA infrastructure

● Self-serve to users

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

The CI environment● Jenkins is extremely flexible... can do almost anything● AuthN/AuthZ

● Currently using Github Oauth ● Take advantage of future Identity Federation

● We wanted to simulate different execution environments● Already in production● Planned for future

● Track and re-use depedendencies

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Matrix-based builds● Independent different builds and build statuses for

different configurations:● Application name● Version● OS● Architecture● … can add specific tuning configurations...

● We can see exactly what's broken where – build more resilient integration code.

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Typical workflow

Test

ing

mat

rix

Defines relevanttests in Jenkins

Writes code to pass required tests

Dev/Stage env.Application developer

Infrastructure expert

Reads descriptionof execution environment tests

Promote a buildto CVMFS

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Dependency managementsimple case

● Common problem with applications : need a specific version of a compiler

● Compiling the compiler can itself be tricky...

● Jenkins tests the full dependency chain necessary

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Real-world application

● GADGET – astrophysics hydrodynamic simulations

● Many (levels of) dependencies

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Public Application Dashboard

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Authenticated view

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Generic build script# GADGET requires HDF5 FFTW2 ZLIB and openmpimodule add cimodule add fftw/2.1.5module add hdf5module add openmpimodule add gsl

# GADGET requires HDF5 FFTW2 ZLIB and openmpimodule add cimodule add fftw/2.1.5module add hdf5module add openmpimodule add gsl

rm ­rf $FFTW_DIRtar xvfz /repo/$SITE/$OS/$ARCH/fftw/$FFTW_VERSION/build.tar.gz ­C /rm ­rf $HDF5_DIRtar xvfz /repo/$SITE/$OS/$ARCH/hdf5/$HDF5_VERSION/build.tar.gz ­C /rm ­rf $OPENMPI_DIRtar xvfz /repo/$SITE/$OS/$ARCH/openmpi/$OPENMPI_VERSION/build.tar.gz ­C /rm ­rf $GSL_DIRtar xvfz /repo/$SITE/$OS/$ARCH/gsl/$GSL_VERSION/build.tar.gz ­C /

rm ­rf $FFTW_DIRtar xvfz /repo/$SITE/$OS/$ARCH/fftw/$FFTW_VERSION/build.tar.gz ­C /rm ­rf $HDF5_DIRtar xvfz /repo/$SITE/$OS/$ARCH/hdf5/$HDF5_VERSION/build.tar.gz ­C /rm ­rf $OPENMPI_DIRtar xvfz /repo/$SITE/$OS/$ARCH/openmpi/$OPENMPI_VERSION/build.tar.gz ­C /rm ­rf $GSL_DIRtar xvfz /repo/$SITE/$OS/$ARCH/gsl/$GSL_VERSION/build.tar.gz ­C /

Set up theenvironment

Clean build, retrieve dependency artifacts

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Generic build scriptmake install DESTDIR=$WORKSPACE/buildmkdir ­p $REPO_DIRrm ­rf $REPO_DIR/*tar ­cvzf $REPO_DIR/build.tar.gz ­C $WORKSPACE/build apprepo

make install DESTDIR=$WORKSPACE/buildmkdir ­p $REPO_DIRrm ­rf $REPO_DIR/*tar ­cvzf $REPO_DIR/build.tar.gz ­C $WORKSPACE/build apprepo

Actually build...Create the artifact

cat <<MODULE_FILE#%Module1.0## $NAME modulefile##proc ModulesHelp { } {    puts stderr "       This module does nothing but alert the user"    puts stderr "       that the [module­info name] module is not available"}preqreq("gsl","fftw/2.1.5","hdf5")module­whatis   "$NAME $VERSION."setenv       GSL_VERSION       $VERSIONsetenv       GSL_DIR           /apprepo/$::env(SITE)/$::env(OS)/$::env(ARCH)/$NAME/$VERSIONprepend­path LD_LIBRARY_PATH   $::env(GSL_DIR)/libMODULE_FILE) > modules/$VERSION

cat <<MODULE_FILE#%Module1.0## $NAME modulefile##proc ModulesHelp { } {    puts stderr "       This module does nothing but alert the user"    puts stderr "       that the [module­info name] module is not available"}preqreq("gsl","fftw/2.1.5","hdf5")module­whatis   "$NAME $VERSION."setenv       GSL_VERSION       $VERSIONsetenv       GSL_DIR           /apprepo/$::env(SITE)/$::env(OS)/$::env(ARCH)/$NAME/$VERSIONprepend­path LD_LIBRARY_PATH   $::env(GSL_DIR)/libMODULE_FILE) > modules/$VERSION

Create the modulefile

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

So, it works ! … almostNext steps

● We have an open, collaborative, low-barrier platform for researchers to bring applications to the grid

● Small technical tasks : ● Implement promoted builds mechanism to populate sagrid.ac.za CVMFS repo● Implement SAML AuthN, integrate IdF● Probes to check that CVMFS is mounted on sites (?)

● Operating in ”stealth mode” at the moment – not advertising, but open to anyone who is interested to collect feedback

● Addressing specific user communities to test drive the system:● Machine learning astro applications (rapid prototyping)● Bioinformatics application suites (complex ecosystem)

● Present next phase of the project in November in Cape Town – move to production