Galaxy Australia distributed nationally. › wp-content › uploads › 2019 › 05 ›...
Transcript of Galaxy Australia distributed nationally. › wp-content › uploads › 2019 › 05 ›...
-
Galaxy AustraliaA public bioinformatics compute resource
distributed nationally.Simon Gladman (presented by Andrew Lonie)
Supported by
Development partners
-
Outline
● A bit about Galaxy Australia
● Nationally distributed compute & Pulsar
● Dynamic job handling & DTD
● Reference data management & CVMFS
● Monitoring and stats collection
● Next?
-
What is Galaxy?
-
What is Galaxy?● Web based bioinformatics
platform● Retains “histories” of analyses● Has large number of cutting
edge tools● Conda environments/Singularity
containers for dependency management
● Has an app store● Has a large support community
-
Galaxy Australia Infrastructure Locations
Galaxy-QLD96 cores - 60TB disk
Galaxy-Mel64 cores - 50TB disk
2015-2017
-
Galaxy Australia Infrastructure Locations
Galaxy AustraliaMain Queue150 cores - 60TB disk
Pulsar-Mel~90 cores
2018-now
Pulsar-NCI~60 cores
-
Galaxy Australia - Usage
• Public and free Galaxy server in Australia
• Currently running > 50,000 jobs month.
-
Canberra Pulsar
Head Node m2.xlarge
Worker Nodem1.xxlarge
NFS & CVMFS
RefsWorker Node
m1.xxlarge
...
Worker Nodem1.xxlarge
Melbourne Pulsar
Head Node - m2.xlarge
2 x web handlers (x6 threads)2 job mules - 1 farm
Some cloudman changes
16 vcpu | 64gb ram | 500gb /mnt
DB Server - m1.large
PostgreSQL100GB Volume
4 vpcu | 16gb ram
CEPH filesystem
Worker Node - m1.xxlarge
Slurm queue: main
Head Node m2.xlarge
Worker Nodem1.xlarge
NFS & CVMFS
Refs
Workers are minimal GVL image, version controlled, updated, with telegraf.Can be swapped into cluster by draining and re-deploying.
Training Jobs
Kubernetes
Integrated Environments
Galaxy-Aus Architecture Brisbane
Worker Node - m1.xxlarge
Worker Node - m1.xxlarge
...
Worker Node - m1.xxlarge
Worker Nodem1.xlarge
...
Worker Nodem1.xlarge
Large/Long Jobs
Slurm Queue
Pulsar QueueDB/Disk Activity
NFS Server
Job Working
TMP
-
Canberra Pulsar
Head Node m2.xlarge
Worker Nodem1.xxlarge
NFS & CVMFS
RefsWorker Node
m1.xxlarge
...
Worker Nodem1.xxlarge
Melbourne Pulsar
Head Node - m2.xlarge
2 x web handlers4 job mules - 1 farm
DB Server - m1.large
PostgreSQL
Worker Node - m1.xxlarge
Slurm queue: main
Head Node m2.xlarge
Worker Nodem1.xlarge
NFS & CVMFS
Refs
Workers are minimal GVL image, version controlled, updated, with telegraf.Can be swapped into cluster by draining and re-deploying.
Training Jobs
Galaxy-Aus Architecture Brisbane
Worker Node - m1.xxlarge
Worker Node - m1.xxlarge
...
Worker Node - m1.xxlarge
Worker Nodem1.xlarge
...
Worker Nodem1.xlarge
Large/Long Jobs
Slurm Queue
Pulsar QueueDB/Disk Activity
GPFSStorage
Services serverSlurm controller
NFS server
-
What is Pulsar?
● Python server application● Allows a Galaxy server to run jobs on a remote system● No shared file system required● Configurable● Securable● Can submit jobs to HPC queueing system● Automatically handles dependency management
-
How Pulsar works
1. User clicks “Execute”2. Galaxy packs up and sends:
○ Data○ Config files○ Tool name & version○ Parameters and other job metadata
3. Pulsar accepts the job4. Pulsar checks if tool is installed locally
○ If not - Installs tool with Conda or Docker5. Pulsar submits job to local queue6. Pulsar waits until job complete7. Pulsar packs up result and sends it back to Galaxy
Pulsar-Mel
-
What we use it for
Pulsar-Mel
● Training jobs / Workshops
● Runs lots of small jobs● Usually 2-4 cores● Allows a tutorial or
workshop to run without interfering with main queue
Pulsar-NCI
● Large long running jobs● Jobs that could run for
days● Usually 16+ core jobs● Lots of Disk-IO● Genome assemblies /
Transcriptome assemblies
-
Dynamic Tool Destinations
● Tool vs Destination rules● YAML● Can have rules for
○ File size○ User○ Tool parameters
● More complex rules in python functions
● Can alter without Galaxy restart
-
The problem of reference data:There’s a lot and it’s hard to deal with
1000’s of genomes availableMultiple sources
> 200 available on Galaxy Australia and growing
Multiple tool indices for each
User requests
Large compute burden to create Index Admin burden to manage
-
So we started talking with other large Galaxies!
Share the burden
● Admin of references● Compute requirements
We decided to share it all via CernVM-fs
● Currently controlled by the Galaxy Project● Moving to a community model soon
-
CVMFS global structureStratum 0 serverdata.galaxyproject.orgPenn State
Stratum 1 serverMelbourne
Stratum 1 serverGermany
Galaxy EU
Galaxy AU
Stratum 1 serverElsewhere
Other compute
server
GVL Instance
Galaxy Main
Stratum 0: The canonical sourceTransactional updates
Stratum 1: Multiple serversMirrors Stratum 0 serverContinuous updates
User servers: Many multiple serversMounts repo from stratum 1Based on GEO-APIWith fallback to other stratum 1s
Primary mount Fallback mount
-
stats.genome.edu.au
-
StructureGalaxy Australia
Head node
Galaxy Australiaworker nodeGalaxy Australiaworker nodeGalaxy Australiaworker nodeGalaxy Australiaworker nodeGalaxy Australiaworker node
Pulsar nodePulsar nodePulsar node
Other serverOther server
Other server
stats.genome.edu.au
telegrafRuns on each machine as a daemon
Influx DBGalaxy-AusPulsar-MelPulsar-NCI
Grafana - App & Web server
-
All very nice tech, but is it used??
-
What’s next?
● More compute nodes around the country
● Cloudstor?
● BYO resources
● More internationalisation of resources
-
●
●
●
○
https://twitter.com/galaxyaustraliahttps://www.embl-abr.org.au/galaxyaustralia/https://usegalaxy.org.au/