How deep learning can help to design better and safer ...€¦ · Numerous commercial and open...

Olexandr Isayev, Ph.D.University of North Carolina at Chapel Hill

@olexandr http://olexandrisayev.com

How deep learning can help to design better and safer medicine?

KinomeNet: multi-task deep convolutional network

How deep learning can help to design better and safer medicine?

KinomeNet: multi-task deep convolutional network

About me

Ph.D. in Chemistry (computational)

Minor in CS

Worked in Federal research lab on HPC & GPU computing to solve chemical problems

Now I am research faculty at the University of North Carolina, Chapel Hill

http://olexandrisayev.com

And I am also Director of Drug Discovery at Atlas Regeneration. We use AI & multi-omics for developing regenerative medicine and stem cell differentiation technologies.

http://atlasregeneration.com/

A public-private partnership that supports the discovery of new medicines through open access researchwww.thesgc.org

How drugs are discovered?

The Long and Winding Road to Drug Discovery

Data Science approachesuseful across the pipeline,

butvery different techniques

aim for success,but if not:

fail early, fail cheap

Medicines Are Transforming the Treatment of Many Diseases

Robotic biological tests (HTS)

Robotic synthesis

Drowning in Databut starving for Knowledge

The rapid growth of materials research has led to accumulation of vast amounts of data: For example, 160,000 entries in the Inorganic Crystal Structure Database (ICSD)

Numerous commercial and open experimental databases NIST, MatWeb, MatBase etc.

Vast computational databases such as AFLOWLIB, Materials Project, and Harvard Clean Energy.

Scannell et al. Nature Reviews Drug Discovery, 2012, 11, 191‐200

Decline in Pharmaceutical R&D efficiency

The cost of developing a new drug (~$2‐3B) roughly doubles every nine years.

Why Drugs are failed?

Selectivity of Kinase inhibitorsAll kinases bind ATP and therefore contain a conserved binding site

Most compounds inhibit more than one kinase

Why Don’t we Do Better?A Couple of Observations

• Tykerb – Breast cancer

• Gleevac – Leukemia, GI cancers

• Nexavar – Kidney and liver cancer

• Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive

Collins and Workman 2006 Nature Chemical Biology 2 689‐700

>40% of biologically active compounds bind to more than one target

~106 – 107

molecules

~102 – 103

molecules

VIRTUAL SCREENING

Empirical Rules/FiltersSimilarity Search

Consensus QSA

PotentialHits

ML or QSAR ModelsStructure-based Models

Virtual Screeningto identify potential hits

Candidate molecules

Our vision for next-gen cheminformatics platforms

• Scale up Machine Learning Methods with the Data• Use all viraity of available data (-omics, sensors, etc)• Take advantage of latest algorithmic developments –

Deep Learning

Collected all human kinase data from open sources

• ChEMBL• PKIS• PubChem• Private datasets• Literature, patents, etc.

300,000+ Molecules

489 Targets

>800,000 Experimental data points

Biggest target data: >25000 molecules Smallest target data: 1

Human Kinase Inhibitor Data Collection

Human Kinase IC50 Data Distribution

“Popular” targets

“Rare” targets

Convolutional Neural Network (ConvNet)

Convolution Function (Filter)

Comes from Image and Signal Processing

The easiest way to understand a convolution is by thinking of it as a sliding window function applied to a matrix.

Groundbreaking results of DL are mostly based on networks with convolutional filters

• Image recognition• Object detection• Medical image processing

Different Levels of Abstraction

• Hierarchical Learning

• Natural progression from low level to high level structure as seen in natural complexity

• Easier to monitor what is being learnt and to guide the machine to better subspaces

• A good lower level representation can be used for many distinct tasks

KinomeNet: Convolutional Neural Network for QSAR

ConvNet

2D matrix of DescriptorsMultitask Learning

(253 targets)

N compounds Active @1uM AUC TN FP TP FN Sensitivity Specificity

MAP4K4 160 10 0.88 149 1 1 9 0.1 0.93

BMX 155 151 0.78 0 4 151 0 1.0 0.0

Some Statistics & Performance Numbers

Random Forest Models

DL Model

MAP4K4 160 10 0.91 150 0 6 4 0.6 0.94

BMX 155 151 0.93 4 0 149 6 0.99 1.0

RF (Random Forest)Average AUC: 0.90

KinomeNetAverage AUC: 0.96

KinomeNet: “Deorphanizing” rare targets

ConvNet

Multitask Learning(253 targets)

2D matrix of Descriptors

KinomeNet: “Deorphanizing” rare targets

ConvNet

“Rare” targets(67 targets)

…“Frequent”(253 targets)

Multitask Learning(320 targets)

2D matrix of Descriptors

Why it Works: Transfer Learning

• Feature‐representation‐transfer

• To learn a “good” feature representation for the target domain.

• The knowledge used to transfer across domains is encoded into the learned feature representation.

• With the new feature representation, the performance of the target task is expected to improve.

Recovery of Kinase Similarity by the Network

Atlas Regeneration

Young dynamic startup company (formed in 2015) in North Carolina

We use AI to develop regenerative medicine

Design molecules to induce iPSC stem cell differentiation

Tissue and muscle regeneration, fibrosis

BIG CHEMICAL DATA

FAST ARTIFICIAL INTELLIGENCE TOP HITS

250M+ SCREENING MOLECULESo Integrated public data

(PubChem, ChEMBL, etc)

o Private datasets

o Literature and patents

o In vitro (HTS)

o In vivo (mouse, rats)

o Multi-omics

o Signaling Pathways

o Gene Expression

AI Drug Discovery Platform

200M+ of potential candidates

SelectivityOff target bindingToxicityMetabolic stabilityBioavailabilitySolubilityetc.

• Good selectivity• Three novel scaffolds• Predicted potency 7 – 25 nM• Good synthetic accessibility• Good ADME/Tox properties

Large scale prediction of bioactivity with Deep Learning

TGF beta inhibitor (Fibrosis)

FAST ARTIFICIAL INTELLIGENCE

• Data availability is the biggest barrier• Novel architecture for multitask‐QSAR• Improvement over well converged RF models• Convenience: 1 vs 320 models• Training of 1 network is faster that 320 RF models• Scalability of DL to “Big Data”• DL benefits from transfer learning• More tasks and more data – higher the benefit• Transferability: KinomeNet ‐> GPCRNet

Conclusions

How deep learning can help to design better and safer ...€¦ · Numerous commercial and open...

Documents

Transcript of How deep learning can help to design better and safer ...€¦ · Numerous commercial and open...

Design Guidelines - Institute of Space Technology · Design Guidelines Laminates should ... Composites,MIL-HDBK-17,MatWeb,andmanufacturer'sdatasheets. ... 145 gm/m2 (Approx) Class

Chemical and pharmaceutical databases and their …NIST Chemistry WebBook - Database of organic chemistry compounds, organized by species. Contains chemical and physical property data

Siti Xoomxoomer.virgilio.it/matweb/dispense 2/Analisi III.pdf · Indice 1 Funzioni analitiche 3 1.1 Notazioni e preliminari . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2014 Technical Catalog NIST SP 260-176 NIST … 2014.pdf · NIST Standard Reference Materials® 2014 Technical Catalog NIST SP 260-176 SRM NIST Standard Reference Materials® Technical

Rajitgadh SmartGrid NIST March 2014 Presentation.pdf | NIST

NIST/EPA/NIH Mass Spectral Library (NIST 05) and … Mass Spectral Library (NIST 05) ... NIST Text Format of Individual Spectra 32 ... 1.6-1.7, of the NIST MS Search Program, v.2.0

NIST Micronutrients Measurement Quality Assurance - NIST Page

Kirkborne NIST-June2012PDF | NIST

NIST Page · NIST Page

NIST and other spectral databases

NIST Time and Frequency Services (NIST Special Publication 432)

NIST and other spectral databases John C. Huffman IUMSC.

NIST Measurement Services: NIST Calibration Services for ...

Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.

NIST it by that much. NIST it by that much. Using and Abusing the NIST 800 Series.

Accurate wavelenghts for X-ray spectroscopy and the NIST Hydrogen and Hydrogen-Like ion databases Svetlana Kotochigova Nancy Brickhouse, Kate Kirby, Peter.

NIST Big Data Interoperability Framework: Volume 6 ... · NIST Special Publication 1500-6r1. NIST Big Data Interoperability Framework: Volume 6, Reference Architecture. NIST Big Data

Phase-Field Methods Jeff McFadden NIST Dan Anderson, GWU Bill Boettinger, NIST Rich Braun, U Delaware John Cahn, NIST Sam Coriell, NIST Bruce Murray, SUNY.

NIST CSF Enterprise - NIST Cybersecurity Framework ... · NIST CSF Frameworks & Methods • NIST Cybersecurity Framework The NIST Cyber Security Framework provides guidance and training’s

NIST BD Platforms 01 Pednault BigData NIST