Pulsar-Network Documentation

45
Pulsar-Network Documentation Release 0.1 Gianmauro Cuccuru, Marco Antonio Tangaro Mar 15, 2022

Transcript of Pulsar-Network Documentation

Page 1: Pulsar-Network Documentation

Pulsar-Network DocumentationRelease 0.1

Gianmauro Cuccuru, Marco Antonio Tangaro

Mar 15, 2022

Page 2: Pulsar-Network Documentation
Page 3: Pulsar-Network Documentation

Pulsar Endpoint Configuration

1 Introduction 31.1 Minimal setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Requirements 52.1 Software needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 OpenStack configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 VGCN image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Preparation 93.1 Pre-tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Building the Pulsar endpoint 114.1 Testing SSH access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 RabbitMQ configuration 13

6 Pulsar configuration 156.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156.2 Pulsar configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7 useGalaxy.eu configuration 177.1 Destination creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177.2 Runner creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187.3 Pull request to useGalaxy.eu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

8 Terraform variables details 218.1 Configuration options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

9 Benchmarking your Pulsar setup 259.1 Configure the benchmarker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259.2 Run the benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269.3 Analyze the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

10 Adding GPUs to your Pulsar setup 2710.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2710.2 Software provided . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2710.3 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

i

Page 4: Pulsar-Network Documentation

10.4 Test your setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

11 Retries of the staging actions 31

12 Jump host 3312.1 Terraform steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3312.2 SSH configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

13 Our Partnerships 35

14 Pulsar Network status 3914.1 Pulsar network endpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

15 Indices and tables 41

ii

Page 5: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

The Pulsar Network is wide job execution system distributed across several European datacenters, allowing to scaleGalaxy instances computing power over heterogeneous resources.

This documentation shows how to install and configure a Pulsar network endpoint on an OpenStack Cloud infrastruc-ture and how to connect it to useGalaxy.eu server. The same Pulsar endpoint can be associated to any Galaxy instance,if properly configured.

Pulsar Endpoint Configuration 1

Page 6: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

2 Pulsar Endpoint Configuration

Page 7: Pulsar-Network Documentation

CHAPTER 1

Introduction

This framework exploits HashiCorp Terraform to perform the installation and configuration of a Pulsar Endpoint onOpenStack using a Virtual imaged named VGCN (see Requirements for more details)

The Terraform script needs to access an OpenStack cloud via API to:

• upload a VM image of 4GB (preferred, but we support also a preloading via the Dashboard interface);

• access an ipv4 external network (means a network that can reach internet);

• the external network has a DHCP server enabled that can provide 1 public IP;

• create an ipv4 private network for the VMs (preferred, but we support also to use a pre-existent network of thiskind);

• create a router to bridge the private network to the external network (optional, if this feature is provided by theCloud network infrastructure);

• create one central manager where all services are installed;

• create one NFS server;

• create N worker nodes;

• attach a storage volume to the NFS server;

• create three secgroups;

• upload an ssh public key to access the Central Manager VM.

1.1 Minimal setup

A minimal setup requires:

• Central manager and NFS server nodes eachwith 4 cores, 8 GB

3

Page 8: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

• Computational workers each with 4-8 cores, 16GB

• 300+ GB volume

but the more the better.

1.2 Architecture

Usegalaxy.eu and the remote Pulsar endpointscommunicate through a RabbitMQ queue.

The interesting aspect of this setup is that thePulsar endpoints don’t need to expose any net-work ports to the external, because:

• the Galaxy and the Pulsar server exchange mes-sages through RabbitMQ.

• the Pulsar server starts all the staging actions,reaching the Galaxy server through its API.

1.2.1 Dependencies

All tools dependencies are resolved byUseGalaxy.eu through a container mecha-nism resolver providing a proper Singularityimage for each job. Singularity images are madeavailable locally through a distribution systemenabled by a CVMFS servers network.

4 Chapter 1. Introduction

Page 9: Pulsar-Network Documentation

CHAPTER 2

Requirements

2.1 Software needed

Ansible, Make and Unzip are needed to follow this instruction.

Ansible can be easily installed following the documentation.

Make and Unzip can be installed on a Linux machine as described here:

On Ubuntu 18.04

# apt-get install make unzip

On CentOS 7

# yum install make unzip

2.2 OpenStack configuration

Access to an OpensStack tenant is needed to perform the Pulsar endpoint installation.

To allow Terraform to access the Tenant download the OpenStack RC File v2.0 (v3) from your OpenStack dashboardand source it.

$ source pulsar-network-openrc.sh

Pulsar need both private and public network to properly work. Moreover, the private network needs to access to theinternet. This is needed to allow the Pulsar Compute Nodes to mount CVMFS repositories to get access to Galaxyreference data or Containers.

5

Page 10: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

2.3 VGCN image

The Pulsar network exploits a Virtual Image, named VGCN, with everything necessary to create a Pulsar Networknode.

The image contains:

• HTCondor

• Pulsar

• NFS

• CVMFS

• Singularity

• Docker

and must be available in your Tenant.

Depending on the OpenStack configuration you will be available to upload the image by URL or not. In the first casethe image can be installed straightforwardly using the Preparation recipes.

6 Chapter 2. Requirements

Page 11: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

Alternatively, the OpenStack Horizon Dashboard allows to upload an image by URL or a local image.

Note: The current VCGN image is located here.

2.3. VGCN image 7

Page 12: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

8 Chapter 2. Requirements

Page 13: Pulsar-Network Documentation

CHAPTER 3

Preparation

This step will create several Pulsar needed resources on the cloud infrastructure, which will be used during the nodesconfiguration:

• the virtual image

• the private network

• the router

This step can be skipped if the resources above are available or provided by your cloud infrastructure in other ways.More details below.

3.1 Pre-tasks

Warning: Source the tenant RC file (see section Requirements) before to start the installation procedure, otherwiseTerraform will not be able to perform resources creation and configuration.

1. Fork the usegalaxy.eu GitHub repository pulsar-infrastructure.

2. Locally clone the forked repository:

git clone https://github.com/<user-name>/pulsar-infrastructure.git

3. Navigate in the pulsar-infrastructure directory:

cd pulsar-infrastructure

4. Execute the pre_tasks step using the Makefile.

The Makefile exploits the Terraform workspaces, defining the workspace name using the environment variableWS.

9

Page 14: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

To automatically setup your environment set WS before the make command, e.g. WS=<workspace-name>make pre_tasks. A new directory is created, named <workspace_name>with the terraform files neededto run the pre_tasks configuration.

Choose a label for your Terraform environment, for example test01:

WS=test01 make pre_tasks

it will create a directory named test01 with the following files:

$ ls test01ext_network.tf pre_tasks.tf providers.tf vars.tf

5. Edit the <workspace-name>/pre_tasks.tf file accordingly with your needs. It has three sections:

• Upload the virtual machine image via OpenStack API. This block should be commented if the image isalready available on your tenant or if you upload it via the dashboard interface.

• Create private network. This block should be commented if network is already available.

• Create a router to ensure the private network will be able to reach the Internet. This block should becommented if this feature is provided by the network

6. Edit the <workspace-name>/vars.tf file to configure the Pulsar endpoint.

Note: All the variables available in the vars.tf file are described in the Terraform variables details section.

7. Validate the terraform recipes configuration:

WS=test01 make plan

8. Run the pre-stasks recipes:

WS=test01 make apply

The resources described in to the pre_tasks.tf are now created on your Openstack tenant.

10 Chapter 3. Preparation

Page 15: Pulsar-Network Documentation

CHAPTER 4

Building the Pulsar endpoint

After running the pre-tasks recipes and having properly edited the vars.tf file (see section Terraform variablesdetails), we are ready to create the Pulsar endpoint.

Navigate into the Pulsar infrastructure directory:

cd pulsar-infrastructure

and execute:

WS=<workspace-name> make init

WS=<workspace-name> make plan

WS=<workspace-name> make apply

The apply command output the IP addresses of the Pulsar Central Manager

...openstack_compute_instance_v2.exec-node: Still creating... (10s elapsed)openstack_compute_instance_v2.exec-node: Creation complete after 17s (ID: 046f2d5e-→˓5bf8-4e75-8015-4e6a4f96fb9d)

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Outputs:

ip_v4_internal = 172.30.135.5ip_v4_public = 90.147.170.170node_name = vgcn-it02-central-manager.pulsar

Finally, all the resources have been created on OpenStack.

Here, for example, the OpenStack dashboard showing a Pulsar endpoint with the Central Manager, the NFS server andtwo worker nodes.

11

Page 16: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

The Pulsar endpoint is now configured, but Pulsar is still turned off. In the next step we will configure Pulsar to talkto usegalaxy.eu RabbitMQ and enable it.

4.1 Testing SSH access

The SSH public key configured in the vars.tf file was already automatically added to the authorized_keysfile of the Central Manager VM. To login to this VM just type:

ssh -i <private_ssh_key> <Central-Manager-Public-IP-address> -l centos

Note: Terraform scripts also add a VGCN private ssh key to the CM and the public one to the other nodes. So aftersuccessfully logged in to the CM, you can reach, without other impediments, the rest of the network.

12 Chapter 4. Building the Pulsar endpoint

Page 17: Pulsar-Network Documentation

CHAPTER 5

RabbitMQ configuration

In this step will be described how to make a PR to the UseGalaxy.eu GitHub repository to create a new RabbitMQaccount for your Pulsar endpoint.

1. Fork the usegalaxy.eu GitHub infrastructure-playbook.

2. Clone the forked repository:

git clone https://github.com/<user-name>/infrastructure-playbook.git

3. Edit the file infrastructure-playbook/group_vars/mq.yml with your favourite text editor:

• in the rabbitmq_users section create a new user adding the following lines:

rabbitmq_users:...

- user: <name_name>password: "{{ <rabbit_mq_password_for_your_user> }}"vhost: /pulsar/<user_name>

For example the configuration for it02 Pulsar endpoint is:

rabbitmq_users:...

- user: galaxy_it02password: "{{ rabbitmq_password_galaxy_it02 }}"vhost: /pulsar/galaxy_it02

• Add your virtual host, previously configured, to the rabbitmq_vhosts section:

rabbitmq_vhosts:...

- /pulsar/<name_name>

For example the configuration for it02 virtual host is:

13

Page 18: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

rabbitmq_vhosts:...

- /pulsar/galaxy_it02

#. Make a Pull Request to the original repository and contact the usegalaxy.eu admins. Once merged the UseGalaxy.euteam will provide you the RabbitMQ queue URL by mail, which needs to be added to your Pulsar configuration asdescribed in the next step.

The queue URL will looks like this:

pyamqp://galaxy_it02:*****@mq.galaxyproject.eu:5671//pulsar/galaxy_it02?ssl=1

14 Chapter 5. RabbitMQ configuration

Page 19: Pulsar-Network Documentation

CHAPTER 6

Pulsar configuration

In this step we will describe how to configure the Pulsar endpoint and turn it on. The RabbitMQ URL, described inthe step before (RabbitMQ configuration), is needed to proceed.

6.1 Prerequisites

6.1.1 hostname configuration

We need to refer to a proper FQDN hostname for your Pulsar server, if it doesn’t have it you can easily create one intoyour local machine in this way:

Add the following line to your /etc/hosts file:

<Central-Manager-Public-IP-address> <pulsar-fqdn-hostname> <pulsar-endpoint-name>

Where the Central-Manager-Public-IP-address is the public IP address of the Central Manager VM,<pulsar-fqdn-hostname> is the FQDN hostname and the pulsar-endpoint-name is the custom name ofyour endpoint.

For example:

• it02 is the pulsar-endpoint-name

• it02.pulsar.galaxyproject.eu is the <pulsar-fqdn-hostname>

• 90.147.170.170 is the Central-Manager-Public-IP-address

6.2 Pulsar configuration

# is a fork needed?

1. Clone the Pulsar infrastructure playbook Github repository:

15

Page 20: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

git clone https://github.com/usegalaxy-eu/pulsar-infrastructure-playbook.git

2. Creates several needed files

Enter in the pulsar-infrastructure-playbook directory:

cd pulsar-infrastructure-playbook

and execute:

make preparation FQDN=pulsar-fqdn-hostname

“Make” makes these changes:

• Updates the inventory file, adding an entry for your Pulsar endpoint

• creates a job_metrics file into files/pulsar-fqdn-hostname

• creates an host var file into host_vars/pulsar-fqdn-hostname

• Create a template yaml file for your endpoint into templates/pulsar-fqdn-hostname

3. Revise/update all the files created in the step above accordingly with your setup requirements.

4. Configure the Pulsar endpoint using the Makefile.

First of all, check if everything is fine with:

$ make check

and, finally, apply changes to Pulsar and turn it on:

$ make apply

16 Chapter 6. Pulsar configuration

Page 21: Pulsar-Network Documentation

CHAPTER 7

useGalaxy.eu configuration

Your endpoint is now able to take jobs! The last step is to enable your endpoint on useGalaxy.eu, thus allowing theEuropean Galaxy Server to send jobs to the new endpoint. Therefore a new destination and a new runner needs to beadded to UseGalaxy.eu. As the entire configuration of the European Galaxy Server is available on GitHub, you can dothis as well.

7.1 Destination creation

Edit the file infrastructure-playbook/files/galaxy/dynamic_rules/usegalaxy/destination_specifications.yaml and add at the end a new destination.

remote_cluster_mq_<custom-suffix>:limits:cores: <number-of-available-cpus-per-node>mem: <ram-available-per-node>

env:GALAXY_MEMORY_MB: '{MEMORY_MB}'GALAXY_SLOTS: '{PARALLELISATION}'LC_ALL: CSINGULARITY_CACHEDIR: /data/share/var/database/container_cache

params:priority: -{PRIORITY}submit_request_cpus: '{PARALLELISATION}'submit_request_memory: '{MEMORY}'jobs_directory: '/data/share/staging'default_file_action: 'remote_transfer'dependency_resolution: 'none'outputs_to_working_directory: Falserewrite_parameters: Truetransport: 'curl'singularity_enabled: truesingularity_default_container_id: '/cvmfs/singularity.galaxyproject.org/u/b/

→˓ubuntu:18.04'

17

Page 22: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

• Replace the <custom_suffix> using a code which identifies your country and a progressive number indi-cating the pulsar installation, e.g. it02 for the second installation of Pulsar in Italy.

• The <number-of-available-cpus-per-node> is the number of the available cores each Pulsar workernode provides.

• The <ram-available-per-node> is the RAM of the Pulsar worker nodes in GB. This number will beconverted in MB, thus multiplied by 1024. Therefore, to avoid out of range memory values, we recommends touse a conservative value, for example by decreasing the value entered by 1 GB.

7.2 Runner creation

Edit the file infrastructure-playbook/group_vars/galaxy.yml and add in thegalaxy_jobconf section an entry corresponding to your pulsar endpoint. Customize the id and insert theRabbitMQ URL, replacing the password with {{ rabbitmq_password_galaxy_<custom_suffix> }}.

For example the runner added for it02 Pulsar node is:

galaxy_jobconf:...

- id: pulsar_eu_it02load: galaxy.jobs.runners.pulsar:PulsarMQJobRunnerparams:amqp_url: "pyamqp://galaxy_it02:{{ rabbitmq_password_galaxy_it02 }}@mq.

→˓galaxyproject.eu:5671//pulsar/galaxy_it02?ssl=1"galaxy_url: "https://usegalaxy.eu"manager: productionamqp_acknowledge: Trueamqp_ack_republish_time: 300amqp_consumer_timeout: 2.0amqp_publish_retry: Trueamqp_publish_retry_max_retries: 60

7.3 Pull request to useGalaxy.eu

Finally, these changes must be merged to the main branch of the infrastructure-playbook repository through a PullRequest.

Warning: To enable this changes on usegalaxy.eu requires at least 1 working day.

18 Chapter 7. useGalaxy.eu configuration

Page 23: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

7.3. Pull request to useGalaxy.eu 19

Page 24: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

20 Chapter 7. useGalaxy.eu configuration

Page 25: Pulsar-Network Documentation

CHAPTER 8

Terraform variables details

Each pulsar endpoint infrastructure can be configured editing the vars.tf, located in the Terraform workspacedirectory.

Navigate in the Pulsar infrastructure directory:

cd pulsar-infrastructure

and edit the file <workspace-name>/vars.tf with your favourite text editor.

8.1 Configuration options

8.1.1 nfs_disk_size

Description Size of the NFS storage drive. We suggest at least 300 GB of storage. The NFS will store alldata that is processed at a given time. This includes input, output, intermediate data and Singularityimages (if Singularity is used).

Example

variable "nfs_disk_size" {default = 300

}

8.1.2 flavors

Description Instance flavours names of the available flavours on your OpenStack tenant.central-manager and nfs-server have minimal requirements, e.g. 4 virtual CPUs and 8GB of RAM for each of them. The exec-node configures the HTCondor worker nodes. Its con-figuration depends on the availability on your Cloud provider. At least 16 vCPUs and 32 GB RAMare recommended, but in the end it depends on the tools/workflows you want to process on yourpuslar endpoint. The gpu-node is optional and identifies the size of the GPU nodes.

21

Page 26: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

Example

variable "flavors" {type = "map"default = {"central-manager" = "m1.medium""nfs-server" = "m1.medium""exec-node" = "m1.medium""gpu-node" = "m1.medium"

}}

8.1.3 exec_node_count

Description Set the number of HTCondor Compute Worker Nodes.

Example

variable "exec_node_count" {default = 2

}

8.1.4 gpu_node_count

Description Set the number of HTCondor GPU Worker Nodes.

Example

variable "gpu_node_count" {default = 0

}

8.1.5 image

Description Set the VGCN image to use. Terraform expects to find on OpenStack with the name speci-fied in this field. More on VGCN image here. The image name must match the name of the imageon the OpenStack tenant. This field can be left unchanged.

Example

variable "image" {type = "map"default = {"name" = "vggp-v31-j132-4ab83d5ffde9-master""image_source_url" = "https://usegalaxy.eu/static/vgcn/vggp-v31-j132-

→˓4ab83d5ffde9-master.raw""container_format" = "bare""disk_format" = "raw"

}}

22 Chapter 8. Terraform variables details

Page 27: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

8.1.6 public_key

Description Defines a SSH public key, which will be inserted in the Central Manager VM, allowing theuser to access it. The SSH public key has to be copied in the pubkey field.

Example

variable "public_key" {type = "map"default = {name = "key_label"pubkey = "ssh-rsa blablablabla..."

}}

8.1.7 name_prefix

Description Defines the name prefix for the resources created on OpenStack. This field can be leftunchanged.

Example

variable "name_prefix" {default = "vgcn-"

}

8.1.8 name_suffix

Description Defines the name suffix for the resources created on OpenStack. This field can be leftunchanged.

Example

variable "name_suffix" {default = ".pulsar"

}

8.1.9 secgroups_cm

Description Defines the security groups of the Central Manager VM. This field can be left unchanged.

Example

variable "secgroups_cm" {type = "list"default = ["vgcn-public-ssh","vgcn-ingress-private","vgcn-egress-public",

]}

8.1. Configuration options 23

Page 28: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

8.1.10 secgroups

Description Defines the security groups for NFS server, Compute Nodes nodes and GPU nodes. Thisfield can be left unchanged.

Example

variable "secgroups" {type = "list"default = ["vgcn-ingress-private","vgcn-egress-public",

]}

8.1.11 public_network

Description Defines the name of the public network, allowing to access the Internet. Depending on theCloud Provider IaaS configuration, if the network is already existing, the default value shouldmatch the name of the public net.

Example

variable "public_network" {default = "public"

}

8.1.12 private_network

Description Defines the name of the private network among the nodes. Depending on the Cloud ProviderIaaS configuration, if the network is already existing, the name should match the name of the pri-vate net and the subnet_name should match the name of the subnet. The associated subnetcidr4 needs also to be configured to match the private_subnet range.

Example

variable "private_network" {type = "map"default = {name = "vgcn-private"subnet_name = "vgcn-private-subnet"cidr4 = "192.168.199.0/24"

}}

8.1.13 ssh-port

Description Defines the SSH port. The default is set to 22. This field can be left unchanged.

Example

variable "ssh-port" {default = "22"

}

24 Chapter 8. Terraform variables details

Page 29: Pulsar-Network Documentation

CHAPTER 9

Benchmarking your Pulsar setup

To determine the performance of your Pulsar setup, you can use the GalaxyBenchmarker. This is a framework thatallows for easy benchmarking of various Galaxy job destinations.

In the following steps, we will make use of a docker-compose setup.

As a start, clone the repository to your local workstation:

git clone https://github.com/usegalaxy-eu/GalaxyBenchmarker.git

A docker-compose setup is already provided that includes InfluxDB and Grafana for easy analysis of the metrics.

9.1 Configure the benchmarker

The benchmarker is configured using a yaml-configuration file. You can find a basic example inbenchmark_config.yml.usegalaxyeu. As a start, rename it to benchmark_config.yml and fill in the cre-dentials for your regular UseGalaxy.eu user (a deeper explanation of the configuration can be found at https://github.com/usegalaxy-eu/GalaxyBenchmarker). To later benchmark your Pulsar setup, we can configure your UseGalaxy.euuser (or an additional one) to launch all jobs against your setup. For this, send an email to [email protected].

Also add the user key for this user to the config file under destinations.

The benchmarker uses the test functionality of Planemo to submit workflows. The docker-compose setup alreadyclones the examples from https://github.com/usegalaxy-eu/workflow-testing. These are available at the the path /workflow-testing inside the benchmarker container. Some of the workflows there are already included in theexample config file. You can also add your own workflows: Simply add the workflows somewhere in the root folderof the repository, which is mounted under /src inside the benchmarker container and mention it in the config file.

workflows:- name: YourCustomWorkflowtype: Galaxypath: /src/custom-workflow-folder/custom-workflow.ga

25

Page 30: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

The benchmarker can submit jobs to multiple job destinations to compare the performance of each. As a start, we canstay with the one destination already defined.

destinations:- name: YourDestinationNametype: Galaxygalaxy_user_key: YOUR-USER-KEY-FOR-DESTINATION-USER

Now, we just need to define the actual benchmark that we want to perform. Simply define the workflows that shouldbe run and the destinations to which the workflows should be submitted to.

benchmarks:- name: BenchmarkName

type: DestinationComparisondestinations:- YourDestinationName

workflows:- ard- adaboost- gromacs- mapping_by_sequencing

runs_per_workflow: 5warmup: true

9.2 Run the benchmark

After everything has been set, we can start the actual benchmarking. Simply run:

docker-compose up

This will spin up a InfluxDB and Grafana container, together with a container running the benchmarker. You canmonitor the progress in the console. If you need to re-run a benchmark, just do a ctrl-c and run the docker-composeup command again.

Warning: The data of InfluxDB is stored inside the container. Before running docker-compose downremember to back up your data!

9.3 Analyze the results

After the benchmarking has been finished, you can view the results using the Grafana Dashboards at http://localhost:3000 (username: admin, password: admin). These Dashboards are still work in progress and you may need to alterthem for your case, however they may be a good starting point.

26 Chapter 9. Benchmarking your Pulsar setup

Page 31: Pulsar-Network Documentation

CHAPTER 10

Adding GPUs to your Pulsar setup

GPU’s devices are presently widely adopted to accelerate high-intensive computational tasks, leveraging the intrinsicparallel computation capability of this kind of hardware. If your Cloud provider makes GPUs available to your tenant,you can effectively apply them in many scientific contexts like the molecular docking, prediction and searching ofmolecular structures or machine learning applications.

In the following steps, we describe how to add a GPU device to the computation cluster created following the instruc-tions provided in the section above.

10.1 Prerequisites

You know the name of the OpenStack’s flavor that can be used to instantiate a VM with one or more GPU devicesconnected and the number of VMs that can be created.

10.2 Software provided

The VGCN image provides all the software need to enable an NVIDIA GPU to submit a GPU job to the HTCondorqueue manager, also through a Docker container.

The current VGCN image provides the following packages to your VMs:

• cuda toolkit 10.1

• Docker version 19.03.8

• NVIDIA Container toolkit 1.1.1

Pay attention, the NVIDIA software will be installed, by a Cloud-init task, at runtime during the first boot.

27

Page 32: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

10.3 Configuration

In the preparation step, you have created a directory named <workspace_name> and inside, you have a vars.tffile with all the parameters to configure the Pulsar endpoint.

Edit the variable flavors and gpu_node_count in <workspace-name>/vars.tf, replacing the defaultvalues with your own details.

Example

variable "flavors" {type = "map"default = {"central-manager" = "m1.medium""nfs-server" = "m1.medium""exec-node" = "m1.medium""gpu-node" = "gpu_flavor_name" <--

}}

variable "gpu_node_count" {default = 10 <--

}

Now you can validate the new terraform configuration:

WS=<workspace-name> make plan

and if the previous step doesn’t show any error, you can go forward applying the new configuration.

WS=<workspace-name> make apply

10.4 Test your setup

Access one of your new shiny workers with a GPU enabled and digit:

nvidia-smi

You will receive a message like this:

$ nvidia-smiTue May 19 17:51:12 2020+----------------------------------------------------------------------------→˓-+| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2→˓ ||-------------------------------+----------------------+---------------------→˓-+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.→˓ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.→˓ ||===============================+======================+======================|| 0 Tesla T4 Off | 00000000:00:05.0 Off |→˓0 |

(continues on next page)

28 Chapter 10. Adding GPUs to your Pulsar setup

Page 33: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

(continued from previous page)

| N/A 37C P0 21W / 70W | 0MiB / 15109MiB | 0%→˓Default |+-------------------------------+----------------------+---------------------→˓-+

+----------------------------------------------------------------------------→˓-+| Processes: GPU→˓Memory || GPU PID Type Process name Usage→˓ ||=============================================================================|| No running processes found→˓ |+----------------------------------------------------------------------------→˓-+

and the same with the latest CUDA docker image:

$ docker run --gpus all nvidia/cuda:10.1-base nvidia-smiTue May 19 16:08:27 2020+-------------------------------------------------------------------------

→˓----+| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2

→˓ ||-------------------------------+----------------------+------------------

→˓----+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.

→˓ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util

→˓Compute M. |

→˓|===============================+======================+======================|| 0 Tesla T4 Off | 00000000:00:05.0 Off |

→˓ 0 || N/A 37C P0 20W / 70W | 0MiB / 15109MiB | 0%

→˓Default |+-------------------------------+----------------------+------------------

→˓----+

+-------------------------------------------------------------------------→˓----+

| Processes: GPU→˓Memory |

| GPU PID Type Process name Usage→˓ |

→˓|=============================================================================|| No running processes found

→˓ |+-------------------------------------------------------------------------

→˓----+

10.4. Test your setup 29

Page 34: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

30 Chapter 10. Adding GPUs to your Pulsar setup

Page 35: Pulsar-Network Documentation

CHAPTER 11

Retries of the staging actions

The Pulsar setup described in these pages provides that the staging actions are carried out by the Pulsar server itself.

Several control parameters have been added to the configuration to ensure reliable communication between the Galaxyserver and the remote Pulsar server. The aim of these parameters is to control the retrying of staging actions in theevent of a failure.

Parameters and default values are:

preprocess_action_max_retries: 10preprocess_action_interval_start: 2preprocess_action_interval_step: 2preprocess_action_interval_max: 30postprocess_action_max_retries: 10postprocess_action_interval_start: 2postprocess_action_interval_step: 2postprocess_action_interval_max: 30

and are included in the app.yml file

For each action (input/preprocess and output/postprocess), you can specify:

• the maximum number of retries before giving up.

• how long start sleeping between retries (in seconds).

• by how much the interval is increased for each retry (in seconds).

• the maximum number of seconds to sleep between retries.

In the following box, as an example, we have the values used in a Pulsar server with a really problematic networkconnection:

preprocess_action_max_retries: 30preprocess_action_interval_start: 2preprocess_action_interval_step: 10preprocess_action_interval_max: 300postprocess_action_max_retries: 30

(continues on next page)

31

Page 36: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

(continued from previous page)

postprocess_action_interval_start: 2postprocess_action_interval_step: 10postprocess_action_interval_max: 300

32 Chapter 11. Retries of the staging actions

Page 37: Pulsar-Network Documentation

CHAPTER 12

Jump host

The pulsar server setup described in this website, instructs you to assign a public IP to your Pulsar server and exposethen its ssh port through a public network, but if you are in the lucky situation to have available a Jump host, you canavoid using the public IP.

A Jump host leverage a Secure Shell (SSH) technique that allows you to reach the Pulsar server from your notebookthrough a gateway. These kinds of hosts are placed at the boundary between the internal and the external and act as asingle point of entry, providing the transparent management of devices within the internal network.

12.1 Terraform steps

There are 4 files that need to be modified to have Terraform creating a Pulsar setup without a public IP, and they are:

• main.tf

• exec_nodes.tf

• gpu_nodes.tf

• output.tf

In particular, in the main.tf remove the first network:

network {uuid = "${data.openstack_networking_network_v2.external.id}"

}

in the exec_nodes.tf and gpu_nodes.tf change the network id from 1 to 0 in the CONDOR_HOST variable

CONDOR_HOST = ${openstack_compute_instance_v2.central-manager.network.0.→˓fixed_ip_v4}

and the same, in the output.tf for the value variable

33

Page 38: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

value = "${openstack_compute_instance_v2.central-manager.network.0.fixed_ip_→˓v4}"

The pulsar setup fr01, created by GenOuest, is a working example of this configuration and the details are available inthe pulsar-infrastructure repo.

12.2 SSH configuration

Now we show you how to create a simple jump with the following details:

• Jump_IP (the jump host using a public IP)

• Destination_IP (the pulsar server using a private IP only)

First, be sure to be able to SSH from your notebook into the Jump_IP and then, from there to the Destination_IP. Donethis, you can configure SSH on your notebook in this way:

nano ~/.ssh/config

and paste the following configuration:

Host pulsar_hostHostname Destination_IPUser centosIdentityFile /path/to/your/private_ssh_keyIdentitiesOnly yesPasswordAuthentication noPort 22ProxyCommand ssh -q -W %h:%p USERNAME@Jump_IP

and from now, you can easily access your Pulsar server from your notebook (through the Jump host) with the com-mand:

ssh pulsar_host

34 Chapter 12. Jump host

Page 39: Pulsar-Network Documentation

CHAPTER 13

Our Partnerships

UFRUniversity of Freiburghttps://www.uni-freiburg.de/

de.NBI CloudGerman Network for Bioinformatics Infrastructure Service, Training, Cooperations & Cloud Computinghttps://www.denbi.de/

CESNETAssociation of universities of the Czech Republic and the Czech Academy of Scienceshttps://www.cesnet.cz/

35

Page 40: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

IBIOM-CNRInstitute of Biomembranes, Bioenergetics and Molecular Biotechnologieshttp://www.ibiom.cnr.it/

ReCaS-BariNational Institute of Nuclear Physics (INFN-Bari) DataCenterhttps://www.recas-bari.it/index.php/en/INFN and ReCaS-Bari are also part of EGI-ACE H2020 project

VIBLife sciences research institute, based in Flanders, Belgiumhttp://www.vib.be/en/Pages/default.aspx

VSCVlaam Supercomputer Centrumhttps://www.vscentrum.be/

36 Chapter 13. Our Partnerships

Page 41: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

Tecnico ULisboaUniversidade de Lisboahttps://tecnico.ulisboa.pt/en/

GenOuestRegional bioinformatics platform in Rennes, France.Member of the national institute IFB.https://www.genouest.org/

Diamond Light SourceUK’s national synchrotron science facilityhttps://www.diamond.ac.uk/Home.html

UIBUniversity of Bergenhttps://www.uib.no/en

37

Page 42: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

CSCIT Center for Science Ltd.https://www.csc.fi/

Melbourne BionformaticsBioinformatics at University of Melbournehttps://www.melbournebioinformatics.org.au/

38 Chapter 13. Our Partnerships

Page 43: Pulsar-Network Documentation

CHAPTER 14

Pulsar Network status

Pulsar network connected to useGalaxy.eu statistics page is available here.

14.1 Pulsar network endpoints

The Pulsar endpoints with the corresponding compute time are reported in the following. The compute time is calcu-lated from the assigned galaxy_slots multiplied by runtime_seconds for each job. It will not calculates theexact CPU usage but it gives and approximation for what was allocated.

39

Page 44: Pulsar-Network Documentation

Pulsar-Network Documentation, Release 0.1

14.1.1 de.NBI Cloud

14.1.2 AG Backofen (HPC)

14.1.3 Freiburg - de.NBI cloud - DE01

14.1.4 Tübingen - de.NBI cloud - DE02

14.1.5 Freiburg - GPUs - de.NBI cloud

14.1.6 CESNET - Czech Republic

14.1.7 Bari - ReCaS - Italy - IT01

14.1.8 Bari - ReCaS - Italy - IT02

14.1.9 Brussel - VIB - Belgium - BE01

14.1.10 Lisbon - Tecnico ULisboa - Portugal - PT01

14.1.11 Rennes - GenOuest - France - FR01

14.1.12 Oxfordshire - Diamond Light Source - United Kingdom - UK01

14.1.13 Espoo - CSC: IT Center for Science Ltd. - Finland - FI01

40 Chapter 14. Pulsar Network status

Page 45: Pulsar-Network Documentation

CHAPTER 15

Indices and tables

• genindex

• modindex

• search

41