How deep learning can help to design better and safer ...€¦ · Numerous commercial and open...
Transcript of How deep learning can help to design better and safer ...€¦ · Numerous commercial and open...
![Page 1: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/1.jpg)
Olexandr Isayev, Ph.D.University of North Carolina at Chapel Hill
@olexandr http://olexandrisayev.com
How deep learning can help to design better and safer medicine?
KinomeNet: multi-task deep convolutional network
How deep learning can help to design better and safer medicine?
KinomeNet: multi-task deep convolutional network
![Page 2: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/2.jpg)
About me
Ph.D. in Chemistry (computational)
Minor in CS
Worked in Federal research lab on HPC & GPU computing to solve chemical problems
Now I am research faculty at the University of North Carolina, Chapel Hill
http://olexandrisayev.com
And I am also Director of Drug Discovery at Atlas Regeneration. We use AI & multi-omics for developing regenerative medicine and stem cell differentiation technologies.
http://atlasregeneration.com/
![Page 3: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/3.jpg)
A public-private partnership that supports the discovery of new medicines through open access researchwww.thesgc.org
![Page 4: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/4.jpg)
How drugs are discovered?
![Page 5: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/5.jpg)
The Long and Winding Road to Drug Discovery
Data Science approachesuseful across the pipeline,
butvery different techniques
aim for success,but if not:
fail early, fail cheap
![Page 6: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/6.jpg)
Medicines Are Transforming the Treatment of Many Diseases
![Page 7: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/7.jpg)
Robotic biological tests (HTS)
Robotic synthesis
![Page 8: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/8.jpg)
![Page 9: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/9.jpg)
![Page 10: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/10.jpg)
Drowning in Databut starving for Knowledge
The rapid growth of materials research has led to accumulation of vast amounts of data: For example, 160,000 entries in the Inorganic Crystal Structure Database (ICSD)
Numerous commercial and open experimental databases NIST, MatWeb, MatBase etc.
Vast computational databases such as AFLOWLIB, Materials Project, and Harvard Clean Energy.
![Page 11: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/11.jpg)
Scannell et al. Nature Reviews Drug Discovery, 2012, 11, 191‐200
Decline in Pharmaceutical R&D efficiency
The cost of developing a new drug (~$2‐3B) roughly doubles every nine years.
![Page 12: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/12.jpg)
Why Drugs are failed?
![Page 13: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/13.jpg)
Selectivity of Kinase inhibitorsAll kinases bind ATP and therefore contain a conserved binding site
Most compounds inhibit more than one kinase
![Page 14: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/14.jpg)
Why Don’t we Do Better?A Couple of Observations
• Tykerb – Breast cancer
• Gleevac – Leukemia, GI cancers
• Nexavar – Kidney and liver cancer
• Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive
Collins and Workman 2006 Nature Chemical Biology 2 689‐700
>40% of biologically active compounds bind to more than one target
![Page 15: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/15.jpg)
~106 – 107
molecules
~102 – 103
molecules
VIRTUAL SCREENING
Empirical Rules/FiltersSimilarity Search
Consensus QSA
PotentialHits
ML or QSAR ModelsStructure-based Models
Virtual Screeningto identify potential hits
Candidate molecules
![Page 16: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/16.jpg)
Our vision for next-gen cheminformatics platforms
• Scale up Machine Learning Methods with the Data• Use all viraity of available data (-omics, sensors, etc)• Take advantage of latest algorithmic developments –
Deep Learning
![Page 17: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/17.jpg)
Collected all human kinase data from open sources
• ChEMBL• PKIS• PubChem• Private datasets• Literature, patents, etc.
300,000+ Molecules
489 Targets
>800,000 Experimental data points
Biggest target data: >25000 molecules Smallest target data: 1
Human Kinase Inhibitor Data Collection
![Page 18: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/18.jpg)
Human Kinase IC50 Data Distribution
“Popular” targets
“Rare” targets
![Page 19: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/19.jpg)
Convolutional Neural Network (ConvNet)
![Page 20: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/20.jpg)
Convolution Function (Filter)
Comes from Image and Signal Processing
The easiest way to understand a convolution is by thinking of it as a sliding window function applied to a matrix.
Groundbreaking results of DL are mostly based on networks with convolutional filters
• Image recognition• Object detection• Medical image processing
![Page 21: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/21.jpg)
Different Levels of Abstraction
• Hierarchical Learning
• Natural progression from low level to high level structure as seen in natural complexity
• Easier to monitor what is being learnt and to guide the machine to better subspaces
• A good lower level representation can be used for many distinct tasks
![Page 22: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/22.jpg)
KinomeNet: Convolutional Neural Network for QSAR
ConvNet
2D matrix of DescriptorsMultitask Learning
(253 targets)
ABL1
ACVR1
ZAK
ZAP70
…
![Page 23: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/23.jpg)
N compounds Active @1uM AUC TN FP TP FN Sensitivity Specificity
MAP4K4 160 10 0.88 149 1 1 9 0.1 0.93
BMX 155 151 0.78 0 4 151 0 1.0 0.0
Some Statistics & Performance Numbers
Random Forest Models
DL Model
MAP4K4 160 10 0.91 150 0 6 4 0.6 0.94
BMX 155 151 0.93 4 0 149 6 0.99 1.0
RF (Random Forest)Average AUC: 0.90
KinomeNetAverage AUC: 0.96
![Page 24: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/24.jpg)
KinomeNet: “Deorphanizing” rare targets
ConvNet
Multitask Learning(253 targets)
ABL1
ACVR1
ZAK
ZAP70
…
2D matrix of Descriptors
![Page 25: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/25.jpg)
KinomeNet: “Deorphanizing” rare targets
ConvNet
“Rare” targets(67 targets)
ACVR1
TYMS
…“Frequent”(253 targets)
Multitask Learning(320 targets)
2D matrix of Descriptors
![Page 26: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/26.jpg)
Why it Works: Transfer Learning
• Feature‐representation‐transfer
• To learn a “good” feature representation for the target domain.
• The knowledge used to transfer across domains is encoded into the learned feature representation.
• With the new feature representation, the performance of the target task is expected to improve.
![Page 27: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/27.jpg)
Recovery of Kinase Similarity by the Network
![Page 28: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/28.jpg)
Atlas Regeneration
Young dynamic startup company (formed in 2015) in North Carolina
We use AI to develop regenerative medicine
Design molecules to induce iPSC stem cell differentiation
Tissue and muscle regeneration, fibrosis
![Page 29: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/29.jpg)
BIG CHEMICAL DATA
FAST ARTIFICIAL INTELLIGENCE TOP HITS
250M+ SCREENING MOLECULESo Integrated public data
(PubChem, ChEMBL, etc)
o Private datasets
o Literature and patents
o In vitro (HTS)
o In vivo (mouse, rats)
o Multi-omics
o Signaling Pathways
o Gene Expression
AI Drug Discovery Platform
![Page 30: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/30.jpg)
200M+ of potential candidates
SelectivityOff target bindingToxicityMetabolic stabilityBioavailabilitySolubilityetc.
7
• Good selectivity• Three novel scaffolds• Predicted potency 7 – 25 nM• Good synthetic accessibility• Good ADME/Tox properties
Large scale prediction of bioactivity with Deep Learning
TGF beta inhibitor (Fibrosis)
FAST ARTIFICIAL INTELLIGENCE
![Page 31: How deep learning can help to design better and safer ...€¦ · Numerous commercial and open experimental databases NIST, MatWeb, MatBaseetc. Vast computational databases such as](https://reader036.fdocuments.net/reader036/viewer/2022070913/5fb47e83ba7af0182644f9be/html5/thumbnails/31.jpg)
• Data availability is the biggest barrier• Novel architecture for multitask‐QSAR• Improvement over well converged RF models• Convenience: 1 vs 320 models• Training of 1 network is faster that 320 RF models• Scalability of DL to “Big Data”• DL benefits from transfer learning• More tasks and more data – higher the benefit• Transferability: KinomeNet ‐> GPCRNet
Conclusions