Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... ·...
Transcript of Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... ·...
![Page 1: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/1.jpg)
Big Data in Toxicogenomics:Towards FAIR predictions
Danyel Jennen
ICCA 2018
![Page 2: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/2.jpg)
Department of Toxicogenomics 2
alternatives toanimal testing
“Toxicity Testing inthe 21th Century”
REACH
![Page 3: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/3.jpg)
Department of Toxicogenomics 3
alternatives toanimal testing
“Toxicity Testing inthe 21th Century”
REACH
Many promisingin vitro
prediction models
![Page 4: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/4.jpg)
Department of Toxicogenomics 4
or dosages used?
is it the rightcell model?
![Page 5: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/5.jpg)
Department of Toxicogenomics 5
or dosages used?
is it the rightcell model?
So what if ….
![Page 6: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/6.jpg)
Department of Toxicogenomics 6
![Page 7: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/7.jpg)
Department of Toxicogenomics 7
…. use all data inone big meta
analysis
![Page 8: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/8.jpg)
Department of Toxicogenomics 8
Aim & explanation of the title
Meta-analysis for in vivo genotoxicity prediction usinggene expression data from multiple in vitro cell models
Big Data In Toxicogenomics:Towards FAIR predictions
Using gene expression data from multiple toxicitystudies stored in freely accessible databases thatalso provide the corresponding meta-data for a“good” and “honest” prediction
![Page 9: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/9.jpg)
Department of Toxicogenomics 9
Aim & explanation of the title
Meta-analysis for in vivo genotoxicity prediction usinggene expression data from multiple in vitro cell models
Big Data In Toxicogenomics:Towards FAIR predictions
Using gene expression data from multiple toxicitystudies stored in freely accessible databases thatalso provide the corresponding meta-data for a“good” and “honest” prediction
![Page 10: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/10.jpg)
Department of Toxicogenomics 10
Aim & explanation of the title
Meta-analysis for in vivo genotoxicity prediction usinggene expression data from multiple in vitro cell models
Big Data In Toxicogenomics:Towards FAIR predictions
Using gene expression data from multiple toxicitystudies stored in freely accessible databases thatalso provide the corresponding meta-data for a“good” and “honest” prediction
![Page 11: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/11.jpg)
Department of Toxicogenomics 11
Aim & explanation of the title
Meta-analysis for in vivo genotoxicity prediction usinggene expression data from multiple in vitro cell models
Big Data In Toxicogenomics:Towards FAIR predictions
Using gene expression data from multiple toxicitystudies stored in freely accessible databases thatalso provide the corresponding meta-data for a“good” and “honest” prediction
![Page 12: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/12.jpg)
Department of Toxicogenomics 12
Workflow
Step 1. Data collection
Step 2. Data processing
Step 3. Train prediction model
Step 4. Validate prediction
Step 5. Biological interpretation
![Page 13: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/13.jpg)
Department of Toxicogenomics 13
Step 1. Data collection
CEBS
Transcriptomics data Compound information
![Page 14: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/14.jpg)
Department of Toxicogenomics 14
Step 1. Data collection
Source Cell modelNumber of
samples (incl.controls)
Number ofcompounds
diXa data warehouse
DIXA002 /carcinoGENOMICS
primary rat hepatocytesnon-TSA treated
205 15
primary rat hepatocytesTSA treated
196 15
DIXA-028 / DrugMatrix primary rat hepatocytes 939 124
Open TG-GATEs primary rat hepatocytes 3370 145
TSA = trichostatin A
![Page 15: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/15.jpg)
Department of Toxicogenomics 15
Step 1. Data collection
Source Cell modelNumber of
samples (incl.controls)
Number ofcompounds
diXa data warehouse
DIXA002 /carcinoGENOMICS
primary rat hepatocytesnon-TSA treated
205 15
primary rat hepatocytesTSA treated
196 15
DIXA-028 / DrugMatrix primary rat hepatocytes 939 124
Open TG-GATEs primary rat hepatocytes 3370 145
Total number of unique compounds235
But, there is only sufficient data available forIn vivo GTX: 24In vivo NGTX: 45
TSA = trichostatin A
![Page 16: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/16.jpg)
Department of Toxicogenomics 16
Step 2. Data processing
Affymetrix Rat Genome 230Affymetrix Rat Genome 2302.0 Array
RMA normalizationCustom CDF version 22
Ensembl gene IDs
Log2ratios
Averaging biologicalreplicates
12162 genes - 619 exposures
619 exposures
205 GTX – 414 NGTX
Split intraining and test sets
80% vs 20%
10 training/test sets
![Page 17: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/17.jpg)
Department of Toxicogenomics 17
Step 3. Train prediction model
Regularized Logistic Regression
Pam: Nearest Shrunken Centroids
Random Forest
k-Nearest Neighbors
Partial Least Squares
GBM: Stochastic Gradient Boosting
xgbLinear: eXtreme Gradient Boosting
Support vector machines:svmLinear, svmLinear2, svmLinearWeights
10 different classification algorithms
R package Caret
![Page 18: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/18.jpg)
Department of Toxicogenomics 18
Step 3. Train prediction model
Performancetraining set
run 1
logistic regressionand SVM score best
![Page 19: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/19.jpg)
Department of Toxicogenomics 19
Step 4. Validate prediction
Performance validation setAverage of 10 runs
PAM svmL Logistic
Accuracy 0.776983 0.956457 0.943685
Sensitivity 0.363177 0.910826 0.891524
Specificity 0.968498 0.978004 0.968957
![Page 20: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/20.jpg)
Department of Toxicogenomics 20
Step 4. Validate prediction
• Per run between 1 – 12 exposures misclassified
• In total 39 unique exposures misclassified
• Out of 553 unique exposures
• Accuracy 93%
• Misclassification usually for extremes
• Either lowest or highest dosage
• Shortest or longest exposure
• No clear relationship between dosage /exposure time and genotoxicity
![Page 21: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/21.jpg)
Department of Toxicogenomics 21
Step 5. Biological interpretation
509 genes in at least 1 run60 genes in all runs
PathVisioOver-representation analysis
Z-score >1.9660 genes 11 significant pathways
Selection of top genes with at least 60%contribution towards classification
![Page 22: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/22.jpg)
Department of Toxicogenomics 22
Step 5. Biological interpretation
p53 pathway
p53 signal pathway
Relationship between glutathione and NADPH
ATM Signaling Pathway
Genetic alterations of lung cancer
G1 to S cell cycle control
Eicosanoid Synthesis
Cell cycle
Fatty Acid Biosynthesis
Selenium Micronutrient Network
Folic Acid Network
![Page 23: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/23.jpg)
Department of Toxicogenomics 23
Present in all 10 runsPresent in 5-9 runsPresent in 1-4 runs
Not present in any runNot in gene list
Step 5. Biological interpretation
![Page 24: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/24.jpg)
Department of Toxicogenomics 24
Step 5. Biological interpretationCdkn1a cyclin-dependent kinase inhibitor 1ACcng1 cyclin G1Rprm reprimo, TP53 dependent G2 arrest mediator homologMdm2 MDM2 proto-oncogeneGtse1 G-2 and S-phase expressed
![Page 25: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/25.jpg)
Department of Toxicogenomics 25
Summary
• The presented prediction for in vivo genotoxicityoutperforms the standard in vitro test-battery– Accuracy ~95% vs ~70%
• In the analysis a fair amount of compounds hasbeen used– 69 compounds (24 GTX & 45 NGTX)
• And includes multiple dosages and time-points– >600 exposures
• Top features are biologically relevant– P53, cell cycle, apoptosis related pathways
![Page 26: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/26.jpg)
Department of Toxicogenomics 26
Future perspective
• Fine-tune prediction analysis on rat data– Add additional compounds / studies
• Proceed with prediction analysis on human data– First results presented at IWGT2017
• Proceed with prediction analysis on mouse data– Data sets are limited
• Work on automated data and meta-data retrieval– Use APIs and containerization (Docker)
• Present final results at Eurotox 2018
![Page 27: Big Data in Toxicogenomics: Towards FAIR predictions › openrisknet › resources › ... · 2019-10-14 · Department of Toxicogenomics 10 Aim & explanation of the title Meta-analysis](https://reader033.fdocuments.net/reader033/viewer/2022042409/5f257579eb36bc57bb2907d6/html5/thumbnails/27.jpg)
Department of Toxicogenomics 27
Acknowledgement
Juma BayjanovJos Kleinjans