HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve...

44
HPC at NIBR Nick Holway Scientific Computing Group, NIBR HPC Advisory Council, Lugano Twitter: @nickholway LinkedIn: https://www.linkedin.com/in/nickholway/ April 2017 Novartis Institutes for Biomedical Research (NIBR)

Transcript of HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve...

Page 1: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

HPC at NIBR

Nick Holway

Scientific Computing Group, NIBR

HPC Advisory Council, Lugano

Twitter: @nickholway LinkedIn: https://www.linkedin.com/in/nickholway/

April 2017

Novartis Institutes for

Biomedical Research

(NIBR)

Page 2: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

HPC means we can get better, more targeted drugs to patients quicker

Public

Page 3: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Today’s talk

1. Introducing Novartis

2. A very quick introduction to drug discovery

3. What our HPC looks like

4. Examples of how we use HPC to accelerate drug discovery

5. Outlook

Public

Page 4: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Introducing Novartis

Page 5: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Novartis

Public

Innovative Medicines

Sandoz

Pharmaceuticals business unit

Oncologybusiness unit

Alcon

R&DR&D

Page 6: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Drug discovery and early development

Public

~6,000Scientists /

7 sites globally

~90New Molecular Entities

~400Research projects

>500Ongoing

clinical trials>400 Computational Scientists

Page 7: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

NIBR

ATI

CVM

Other

ID

MSD

ONC

IO

OPH

RESP

NEURO

Public

Organised around prevalent Disease Areas

Note: Distribution of ~90 New Molecular Entities at NIBR

Immuno-Oncology

Oncology

Ophthalmology

Respiratory Diseases

Neuroscience

Autoimmunity, Transplantation & Inflammation

Cardiovascular & Metabolism

Infectious Diseases

Musculoskeletal

Page 8: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

A very quick introduction to

drug discovery

Page 9: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

What’s a drug?

• ”A pharmaceutical drug is a drug used to diagnose, cure, treat, or prevent

disease” (https://en.wikipedia.org/wiki/Pharmaceutical_drug)

• Examples of some medicines and their “mechanisms”:

– Cancer: blocking cells dividing by disrupting DNA replication or the cells’ internal

skeletons

– Infectious diseases: disrupting bacterial cell walls

– Depression: Preventing serotonin re-uptake in nerve cells in the brain

Public

Page 10: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

The path to a new medicine

Public

Discovery Clinical trials EvaluationPost-

approval

Target

selection

Drug

research and

design

Preclinical

research

Proof of

Concept

5–15patients

Phase I

20–100healthy

volunteers

and/or

patients

Phase II

100–500patients

Phase III

1000–5000patients

Submission

Review by

regulatory

authority

Phase IV

Post-marketing

surveillance and

research

Manufacturing

Investigational New Drug (IND)

Application submitted

NDA/ BLA*

submittedApproval of

one new medicine

10 – 15 years

>10 000Compounds

<250Compounds

<5Compounds

*New Drug Application / Biologics Licence Application

Page 11: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

What our HPC looks like

Page 12: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

HPC at NIBR - Hardware

• x86 servers– Intel Xeon CPUs– 128-768GB RAM– FDR Infiniband– 10GigE

• Specialised nodes– Nvidia GPUs– >=1TB RAM

• Isilon storage – CIFS/NFS– 10GigE to Arista switches

• Lustre– Scratch

Public

Page 13: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

HPC at NIBR - Software

• RHEL 6.x

• Univa Grid Engine for scheduling

• Software compilation & configuration

– Easybuild

– Modules

– GCC, Intel, Nvidia compilers

• Languages: C++, Fortran, CUDA, Python, R, Matlab

• Libraries: *MPI, MKL etc

• The software stack is identical on Linux desktops and “scientific servers”

Public

Page 14: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

HPC at NIBR - Humans

• HPC is provided by the Scientific Computing Group (SciComp)

• Global team (Europe, USA, Asia)

• Complementary backgrounds and skills

– Sysadmins

– Mathematicians

– Scientists

• HPCWire award winners in 2014

• Other teams in NIBR Informatics provide storage, Linux servers, etc.

Public

Page 15: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

HPC at NIBR: Community

• We’ve worked very hard to build an interdisciplinary group of informatics

scientists to share knowledge

• Various activities

– Fortnightly informal talks

– Social events

– Deep Learning “bootcamp”

– 24hr virtual multi-site workshop (Shanghai -> California!)

• This started out from the grassroots and has now been formally funded within

the Company

Public

Page 16: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

HPC elsewhere in Novartis

• Today’s talk covers Research; however HPC is used elsewhere in the

Company for

– Modelling Drug absorption, metabolism & secretion (PK/PD)

– Processing data from Clinical Trials

– Predicting where in the lungs inhaled drugs go (CFD)

• The cluster used for this work is much more tightly controlled and tested than

NIBR’s systems

Public

Page 17: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Examples of how we use HPC

to accelerate drug discovery

Page 18: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Using HPC in early drug discovery

• There are many different ways NIBR scientists use HPC

– Molecular dynamics

– NGS analysis

– Ligand-protein docking

– Image analysis

– Cryo-EM analysis

• Our usage is similar to a university with biology and chemistry departments

• In today’s talk I’ll focus on using HPC to accelerate phenotypic assays

Public

Page 19: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Phenotypic assays

• Traditionally our scientists have used biochemical assays in early stage drug

discovery

– Assays use an isolated enzyme or protein and measure fluorescence etc.

– This tells us very little about the cells and how they react

• Increasingly our scientists are using “phenotypic assays” using cells grown in a

lab

– Scientists can see the impact of their drug on an entire cell or population of

cells

Public

Page 20: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Example: wound healing

Public

24 hrsImages from http://cellprofiler.org/examples/#Wound

Page 21: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

What is High Content Screening

(HCS)

• A method for identifying molecules which alter the phenotype of cells (eg cell

shape, number etc) or small organisms (eg Malaria parasites)

• Using robotics & automated microscopes a large number of potential drugs can

be ”screened” in a few hours or days

• Assays can generate a lot of data

– Videos

– Millions of images

– >600TB/yr for some HCS instruments

Public

Page 22: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Accelerating MND/ALS disease

research with GPUs

Public

Page 23: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

In-vitro model for neuromuscular

junctions

• Faulty junctions between motor neurons and muscle cells are implicated in

MND/ALS

• We’d like to create a drug which corrects this

• Motor neurons & myotube (muscle fibre) cells were “co-cultured” in a “plate” to

which drug candidates are added

• Cells were imaged in real time to measure their contractility

• This is very hard to see by eye and also hard to segment using computers

Public

Page 24: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

What do the cells look like?

Public

Figure: I Hossain

Page 25: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Motion estimated with Optic Flow

Public

Different contracting regions

Total area under contraction

Figure: I Hossain

Page 26: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Impact of HPC

• A good joint project between bench scientists, lab automation experts &

informaticians

• 80x increase of throughput compared to CPU

• NIBR scientists have access to new method of monitoring myotube contractility

Public

Page 27: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Deep learning for HCS image

analysis

Public

Page 28: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

CNNs for HCS image analysis

• HCS analysis is traditionally performed using tools such as CellProfiler, Fiji or

commercial tools

• Deep Learning approaches are becoming increasingly used for image analysis

• A team has investigated Convolutional Neural Networks for deriving images’

phenotypes

• They used only the images’ pixel intensity values with no a priori knowledge

• They used public and Novartis datasets

Public

Page 29: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Outcome

• The images were classified better than conventional methods

• This is included tracking a response to drugs

• There was no need to design a unique pipeline for the processing

Public

Page 30: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Interested in knowing more?

• This work has been published (including some code) in Godinez et al: “A multi-

scale convolutional neural network for phenotyping high-content cellular

images”, Bioinformatics btx069, https://doi.org/10.1093/bioinformatics/btx069

Public

Page 31: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Pushing HPC to non-technical

scientists

Public

Page 32: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Why bench scientists need HPC

(and don’t realise it!)

• Bench scientists generally do not know how to programme or use the Linux

command line

• Many scientists’ data has grown too big to be processed on a single

workstation

• This means they have to wait a long time for the data to be processed and also

they may need to wait for an informatician to become available

• If you can give a scientist the tools to analyse their data at scale then they get

their data sooner and enables the informaticians to focus on more complex

tasks

Public

Page 33: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Pushing HCS analysis to bench

scientists

• Our scientists create pipelines using CellProfiler (http://cellprofiler.org/) using

the normal GUI on their laptops

• The Pipeline is then uploaded to a central server at each site

• The scientist can kick off a analysis run on the cluster using the same webpage

that they use to visualise their images

Public

Page 34: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Screenshots of the GUI

Public

Page 35: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Also (ab)using Jenkins

• Our scientists have automated cluster submission using the continuous integration tool, Jenkins, again with a web front end

• The work has been published: https://doi.org/10.1177/1087057116679993

• Source freely available at https://github.com/Novartis/Jenkins-LSCI

Public

Page 36: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Outlook

Page 37: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

HPC Trends

• GPUs / Intel Phi / FPGAs

– Deep learning

– Cryo-EM

• Real time collection & processing of data from clinical trials

• Integration of “big data” technologies such as Apache Spark into HPC

• HPC in the cloud

– Currently most useful for bursting or embarrassingly parallel jobs

Public

Page 38: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Thank you

Page 39: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Back up

Page 40: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

HPC in the cloud

Page 41: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

HPC in the cloud

• NIBR have used Amazon EC2 for compute workloads

– Cycle computing

• ISVs eg DNANexus

– Bioinformatics NGS

Public

Page 42: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Docking at scale in the cloud

• Ligand-protein docking is “to predict the position and orientation of a ligand (a

small molecule) when it is bound to a protein receptor or enzyme” (Wikipedia)

• Embarrassingly parallel - compute-heavy / data-light

• We used the cloud to screen 10 million molecules against a cancer target

Public

Page 43: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

How we did it

• Cycle computing’s software (Cycle server, Cyclecloud)

• Over 10,000 EC2 spot instances

– Extensive benchmarking to select instance type

• Licence files (licence servers cannot cope with the load)

• Proprietary compounds run in NIBR’s VPC, others in “public”

• See http://opensource.nibr.com/videos/aws-litster/ and

http://cyclecomputing.com/novartis-taps-cloud-hpc-for-faster-drug-discovery-

better-science/

Public

Page 44: HPC at NIBR · Novartis Institutes for Biomedical Research HPC at NIBR: Community • We’ve worked very hard to build an interdisciplinary group of informatics scientists to share

Novartis Institutes for Biomedical Research

Where we’re going in the cloud

• “Cloud by default” for many non-HPC applications

• Clinical data (subject to “informed consent”)

• HPC where appropriate

– IB etc for tightly-coupled parallel jobs usually unavailable

– Data locality challenging

Public