It's all in the genes

42
@rjlkuipers [email protected] m It’s all in the genes The power of the Oracle database and Exadata in cancer research

Transcript of It's all in the genes

1. @[email protected] all in the genesThe power of the Oracle database and Exadata in cancer research 2. About me Business Manager Data and BI Solutions Datawarehouse Architect Business Intelligence specialist Master degree in Biochemistrymolecular biology cancer genetics@[email protected] 3. Agenda Basic genetics analyses Technology behind this What does it look like The next step: combining genomic data with patient data When both worlds meet@[email protected] 4. Set the contextBASIC GENETICS@[email protected] 5. @[email protected] 6. Chromosomes@[email protected] 7. Genes@[email protected] 8. basic geneticsDETERMINING THE GENETICSEQUENCE@[email protected] 9. Genetic sequence Blood / cancer tissue DNA isolation DNA amplification DNA Sequencing (40x - 80x)@[email protected] 10. Genetic sequence approx. 5% of DNA is gene approx. 95% of DNA is referred to as junk-DNA 99% of entire DNA sequence is stable Genetic variations are normal@[email protected] 11. @[email protected] 12. DNA (Next Generation) SequencingFrom blood-sample to DNA sequence 3 billion basepairs 2 TB per sample unique: whole genomes@[email protected] 13. Abnormal genetic variations@[email protected] 14. Searching for the unknown genetic variations normal genetic variations cancer better diagnoses require better analyses. Upfront (predictive) diagnoses require a lot of data andprocessing power. result: less-invasive treatment, better patient-life. What did we not know (yet) and can be learned from Ultimate goal: centralized DNA library for statistical purposes@[email protected] 15. THE TECHNOLOGY BEHIND THIS@[email protected] 16. DNA (Next Generation) Sequencing 3 billion basepairs 2 TB per sample Whole genomes@[email protected] 17. Handling large volumes Oracle Database Partitioning Optimized data model Oracle Exadata Database Machine Optimized to run Oracle Database Specific performance features- Smart Scans- Exadata Hybrid Columnar Compression Performance increase: 700x@[email protected] 18. Handling large volumes - database benefits Datamodel V1 Sample-oriented (partitioned) Each base-position stored (compared to reference genome)- leads to 95% no-calls 206 samples --> 800 GB- max 2500 samples on Exadata Indexes are (still) needed: Index size 5x larger than sample-size@[email protected] 19. Handling large volumes - database benefits Datamodel V2 Sample-oriented (partitioned) positions are stored as regions (buckets)- 1000 positions per region Buckets are indexed EHCC Compression Reduce redundant data- Store allele 1 and 2 as 1 row when values are equal Storage 99GB (246 samples)- Up to 20.000 samples Indexes require less space than in Datamodel V1@[email protected] 20. Exadata benefits Flash Parallel processing Smart Scans Exadata Hybrid Columnar Compression Lets have a look videos courtesy of Frits Hoogland@[email protected] 21. Executed testsNr@[email protected] Disk type1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD 22. Executed testsNr@[email protected] Disk type1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD 23. 23 24. 24 25. Executed testsNr@[email protected] Disk type1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD 26. 26 27. 27 28. Executed testsNr@[email protected] Disk type1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD 29. 29 30. 30 31. Executed testsNr@[email protected] Disk type1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD 32. 32 33. Query performance (times are seconds)Nr@[email protected] Disk type 11.2.0.1 11.2.0.21 - Serial HDD 695 1532 - Serial FDD 403 913 - 64 HDD 19 184 - 64 FDD 16 135 SS Serial HDD 416 SS Serial FDD 377 SS 64 HDD 138 SS 64 FDD 69 SS + EHCC 64 FDD 1 34. WHAT DOES IT LOOK LIKE ?@[email protected] 35. @[email protected] 36. Why is this important? Speed Faster results No is found earlier Volume (Centralized DNA Library) Better statistical basis Less-invasive treatments for patients Personalized healthcare@[email protected] 37. Even more Add clinical data to genomic data. Patient history Drug treatment history DemographicsClinicalData Biobanks@[email protected] Omic DataIntegration of Data 38. Oracle Translational Research Center (TRC)@[email protected] 39. @[email protected] 40. Advanced visualizations@[email protected] 41. Summary Care is primary. Technology is supporting. Oracle offers platforms to provide better care Database Exadata TRC Clinical and Genomic data are complimentary. Not everything is in the genes@[email protected] 42. @[email protected]