Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A...
Transcript of Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A...
![Page 1: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/1.jpg)
Data integration and computational systems approaches to target discovery
and pharmacokinetic modelling
Kenji Mizuguchihttp://[email protected]
![Page 2: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/2.jpg)
Data integration for pharmacokinetic modelling
Target Discovery
![Page 3: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/3.jpg)
Psychiatric
OthersMuscle
Gastric Hepatictoxicity
Cardiactoxicity
Basis for understanding toxicity, as well as poor efficacy
Pharmacokinetics; PK
Kenji Mizuguchi(NIBIOHN)
Teruki Homma (RIKEN)
Hiroshi Yamada(NIBIOHN)
Cardiac toxicity Hepatic toxicity
• Integrated databases • Multi-scale modelling
(prediction) systems
J.S. MacDonald, R.T. Robertson
Primary Reasons for Drug Withdrawals from US, European or Asian Markets (Toxicity) 1998-2008
Toxicity-related attrition
funded by
![Page 4: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/4.jpg)
Drug Discovery Informatics System
Compound prioritization at screening
Molecular design during optimization
Pharmacokinetics
Cardiac toxicity Hepatic toxicityPortal site
DB
DB
Prediction system
Prediction system Prediction system
DB
Collaboration with pharmaceutical
companies
DB integration
![Page 5: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/5.jpg)
ADMET
Bloo
d co
ncen
tratio
n
Time
A drug which has a well absorption and faster metabolismA drug which has a poor absorption and slower metabolism
Absorption and Metabolism(Therapeutic drug monitoring)
DistributionThe compound needs to be carried to its effector site, most often via the bloodstream.
The kidney is the most important site and it is where products are excreted through urine.
Excretion
Toxicity
Absorption
DistributionMetabolism
Excretion
![Page 6: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/6.jpg)
Import the curated data to database
Non-curated experimental data
CLint (intrinsic clearance)
JournalComp. A66 mL/min/mg
JournalComp. B135 μL/min/mg
Check the original publications
Correct value and convert units
Convert units
Different protocol
No.
Name Value Unit Description
1 Comp. A 66000 mL/min/g Intrinsic clearance in human liver microsomes
2 Comp. B 135 mL/min/g Drug metabolism in human liver microsomes assessed as intrinsic clearance
3 Comp. C 58.5 mL/min/kg Intrinsic clearance in human microsome in presence of NADPH
4 Comp. C 16.2 mL/min/kg Intrinsic clearance in human microsome
JournalComp. CPresence of NADPH58.5 μL/min/mg
Absence of NADPH16.2 μL/min/mg Discard
No. 1
No. 2
No. 3
No. 4
No. Name Value Unit Description
1 Comp. A 66 mL/min/mg Intrinsic clearance in human liver microsomes
2 Comp. B 135 μL/min/mg Drug metabolism in human liver microsomes assessed as intrinsic clearance
3 Comp. C 58.5 μL/min/mg Intrinsic clearance in humanmicrosome in presence of NADPH
Convert units
public DB
Collecting pharmacokinetic parameters
Esaki et al. Mol Inform (2018)
![Page 7: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/7.jpg)
External test set
Curated data
Non-curated data
HN
O
CH3
HO
O
OH
CH3
CH3
H3C
O
O
CH3
OH
HN
O
CH3
HO
Calculate Descriptors from Structure
• Atom count• MW• logP• PSA
O
N
N
O
OH
NH2
O
NH2
…
O
N
N
O
OH
NH2
O
NH2
Accuracy:0.75
Kappa:0.56
Accuracy:0.77
Kappa:0.59
Accuracy:0.67
Kappa:0.47
Accuracy:0.74
Kappa:0.54
CH3
ONCH3
CH3
Data quality influences prediction accuracy
Build prediction models & Cross validation
Evaluateusing test set
![Page 8: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/8.jpg)
Data integration for pharmacokinetic modelling
Target Discovery
![Page 9: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/9.jpg)
• Integrate a wide range of data sources.• Enable complicated searches that are
difficult to achieve with existing tools.• Custom annotations and in-house data.
at the Cambridge Systems Biology Centre
Chen et al., PLoS ONE 6(3): e17844 (2011)
TargetMine
http://targetmine.mizuguchilab.org
Data warehouse for drug target prioritization
Powered by
![Page 10: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/10.jpg)
Data sources in TargetMine
iRefIndex
Chemical compounds & interactions
Domains & 3D structures
Pathways & interactions Genes & proteins
Transcription factortarget associations
micro RNA &target associations
INTEGRATION
Homo sapiens, Rattus norvegicus, Mus musculus
![Page 11: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/11.jpg)
Questionnaire EMR
Clinical laboratory tests Pulmonary Function Tests
X-ray, CT
Integrated DB
Obtaining structured clinical data
With Osaka University Respiratory Medicine, Medical Informatics
![Page 12: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/12.jpg)
Patient vectors
DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD (EHR) ANALYSIS 8
these codes. Patient representation frameworks are typicallyoptimizing a supervised learning task (e.g. predicting patientmortality) by improving the representation of the input (e.g.,the patients) to the deep learning network.
(1) Concept Representation: Several recent studies haveapplied deep unsupervised representation learning techniquesto derive EHR concept vectors that capture the latent sim-ilarities and natural clusters between medical concepts. Werefer to this area as EHR concept representation, and itsprimary objective is to derive vector representations fromsparse medical codes such that similar concepts are nearbyin the lower-dimensional vector space. Once such vectorsare obtained, codes of heterogeneous source types (such asdiagnoses and medications) can be clustered and qualitativelyanalyzed with techniques such as t-SNE [18], [19], [23], word-cloud visualizations of discriminative clinical codes [20], orcode similarity heatmaps [40].
Distributed Embedding: Since clinical concepts are oftenrecorded with a time stamp, a single encounter can be viewedas a sequence of discrete medical codes, similar to a sentenceand its ordered list of words. Several researchers have appliedNLP techniques for summarizing sparse medical codes into afixed-size and compressed vector format. One such techniqueis known as skip-gram, a model popularized by Mikolov etal. in their word2vec implementation [54]. Word2Vec is anunsupervised ANN framework for obtaining vector represen-tations of words given a large corpus where the representationof a word depends on the context, and this technique is oftenused as a pre-processing step with many text-based deeplearning models. Similarly, E. Choi et al. [18], [22] and Y.Choi et al. [39] both use skip-gram in the context of clinicalcodes to derive distributed code embeddings. Skip-gram forclinical concepts relies on the sequential ordering of medicalcodes, and in the study of Y. Choi et al. [39], the issue ofmultiple clinical codes being assigned the same time stamp ishandled by partitioning a patient’s code sequence into smallerchunks, randomizing the order of events within each chunk,and treating each chunk as a separate sequence.
Latent Encoding: Aside from NLP-inspired methods, othercommon deep learning representation learning techniques havealso been used for representing EHR concepts. Tran et al.formulate a modified restricted RBM which uses a structuredtraining process to increase representation interpretation [23].In a similar vein, Lv et al. use AEs to generate conceptvectors from word-based concepts extracted from clinicalfree text [36]. They evaluated the strength of relationshipsbetween various medical concepts, and found that traininglinear models on representations obtained via AEs greatlyoutperformed traditional linear models alone, achieving state-of-the-art performance.
(2) Patient Representation: Several different deep learningmethods for obtaining vector representations of patients havebeen proposed in the literature [20], [22], [23], [38], [39]. Mostof the techniques are either inspired by NLP techniques suchas distributed word representations [54], or use dimensionalityreduction techniques such as autoencoders [13].
One such NLP-inspired approach is taken by Choi et al.[18], [22], [38] to derive distributed vector representations
Demographics Diagnosis Codes
Procedure Codes
Medication Codes
Laboratory Codes
Encoded Representation
Thousands of Sparse Codes
Fixed-size and Compressed Vector
Fig. 8. Illustration of how autoencoders can be used to transform extremelysparse patient vectors into a more compact representation. Since medicalcodes are represented as binary categorical features, raw patient vectorscan have dimensions in the thousands. Training an autoencoder on thesevectors produces an encoding function to transform any given vector intoit’s distributed and dimensionality-reduced representation.
of patient sentences, i.e. ordered sequences of ICD-9, CPT,LOINC, and National Drug Codes (NDC), using both skip-gram and recurrent neural networks. Similarly, the Deeprframework uses a simple word embedding layer as input toa larger CNN architecture for predicting unplanned hospitalreadmission [19].
Miotto et al. directly generate patient vectors from rawclinical codes via stacked AEs, and show that their systemachieves better generalized disease prediction performanceas compared to using the raw patient features [14]. Theraw features are vectorized via a three-layer AE network,with the final hidden layer’s weights yielding the patient’scorresponding representation. As a part of their frameworkfor patient representation, they incorporated the clinical notesassociated with each patient encounter into their representationframework using a form of traditional topic modeling knownas latent Dirichlet allocation (LDA). An example of usingautoencoders for patient representation is shown in Figure 8.
Choi et al. [18] derive patient vectors by first generatingconcept and encounter representations via skip-gram em-bedding, and then using the summed encounter vectors torepresent an entire patient history to predict the onset of heartfailure . Similarly, Pham et al in their DeepCare frameworkgenerate two separate vectors for a patient’s temporal diagnosisand intervention codes, and obtain a patient representationvia concatenation, showing that the resulting patient timelinevectors contain more predictive power than classifiers trainedon the raw categorical features [20]. They employ modifiedLSTM cells for modeling time, admission methods, diagnoses,and interventions to account for complete illness history.
Aside from simple vector aggregation, it is also possibleto directly model the underlying temporal aspects of patienttimelines. Mehrabi et al. [40] use a stacked RBM trained oneach patient’s temporal diagnosis codes to produce patientrepresentations over time. They pay special attention to tem-poral aspects of EHR data, constructing a diagnosis matrix for
Omics data Images
Clustering and learning
Important features
Key proteins/genes
・・・
With AIRC
From clinical data to drug targets
![Page 13: Data integration and computational systems approaches to target … · 2018. 11. 7. · DEEP EHR: A SURVEY OF RECENT ADVANCES IN DEEP LEARNING TECHNIQUES FOR ELECTRONIC HEALTH RECORD](https://reader034.fdocuments.net/reader034/viewer/2022052014/602bbf4cbb466b3e03658144/html5/thumbnails/13.jpg)
Databases
Statistical modelling
Ab initio modelling
• Need to automate curation and knowledge extraction
Conclusions
• General themes for rational drug discovery and development
• Data integration is key