"When time matters..."

41
AnalyzeGenomes.com: When time matters… Dr. –Ing. Matthieu-P. Schapranow Festival of Genomics, London, UK Jan 31, 2017

Transcript of "When time matters..."

AnalyzeGenomes.com: When time matters…

Dr. –Ing. Matthieu-P. Schapranow Festival of Genomics, London, UK

Jan 31, 2017

What is the Hasso Plattner Institute, Potsdam, Germany?

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

2

From Raw Genome Data to Analysis

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

■  DNA Sequencing: Transformation of analogues DNA into digital format

■  Alignment: Reconstruction of complete genome with snippets

■  Variant Calling: Identification of genetic variants

■  Data Annotation: Linking genetic variants with research findings

3

■  Purpose: Transformation of analogous DNA into digital format (A/D converter)

■  Input: Chunks of DNA

■  Output: DNA reads in digital form, e.g. in FASTQ format

1. DNA Sequencing

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

4 4peaks.app

■  FASTQ format used for further processing

■  One read is a quart-tuple of:

1.  Sequence identifier / description

2.  Raw sequence

3.  Strand / direction

4.  Quality values per sequenced base

1. Output of Sequencing

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

5

■  Purpose: Mapping of DNA reads to a reference

■  Input:

□  DNA reads := Sequence of nucleotides with a length of 100 bp up to some 1 kbp

□  Reference genome := Blueprint for alignment of DNA reads

■  Output: Mapped DNA reads

■  Bear in mind:

□  Less fraction in DNA reads, i.e. longer reads, allows more precise alignment

□  Reference from same origin improves mapping quality

2. Alignment Overview

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

6

■  Purpose: Variant Calling := Detect variations within a genome

■  Input:

□  Mapped DNA reads, i.e. output of alignment process

□  Reference genome

■  Output: List of variants

■  Bear in mind:

□  Read depth at posi:= Number of nucleotides storing information about pos i

3. Variant Calling Overview

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

7

■  Purpose:

□  Assess impact of genetic changes

□  Understand gene function and possible medical therapy options

■  Input: List of genetic variants

■  Output: Details about certain genetic locus

4. Genetic Annotations

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

8

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE

chr7 140753336 rs113488022 T A 61 PASS NS=1 GT 0/1

■  Manual, time-consuming Internet search, e.g. publications, annotations, guidelines

■  International consortiums provide fragmented information

■  Missing linkage across individual data sources

■  Annotations vary in completeness and correctness

4. Challenges Today

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

9

■  https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=rs113488022

4. Interpretation of Annotations: BRAF Gene dbSNP

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

10

4. Interpretation of Annotations: BRAF Gene Kegg

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

11

4. Interpretation of Annotations: BRAF gene GeneCards

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

12

4. Interpretation of Annotations: BRAF Gene PubMed

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

13

Simplified Clinical Oncology Process (1/2)

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

14

Simplified Clinical Oncology Process (1/2)

Simplified Clinical Oncology Process (1/2)

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

15

Simplified Clinical Oncology Process (2/2)

■  Can we enable clinicians to take their therapy decisions:

□  Incorporating all available patient specifics,

□  Referencing latest lab results and worldwide medical knowledge, and

□  In an interactive manner during their ward round?

Our Motivation Turn Precision Medicine Into Clinical Routine

When time matters...

16

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

Use Case: Precision Oncology Identification of Best Treatment Option for Cancer Patient

■  Patient: 48 years, female, non-smoker, smoke-free environment

■  Diagnosis: Non-Small Cell Lung Cancer (NSCLC), stage IV

■  Markers: KRAS, EGFR, BRAF, NRAS, (ERBB2)

1.  Remove tumor through surgery

2.  Send tumor sample to laboratory for DNA extraction

3.  Sequence complete DNA of sample results in 750 GB of raw genome data

4.  Process raw genome data, e.g. alignment, variant calling, and annotate

5.  Identify relevant variants using international medical knowledge

6.  Support decision making, e.g. link to de-identified historic cases Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

17

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

18

Our Vision Medical Board Incorporating Latest Medical Knowledge

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

19

The Challenge Distributed Heterogeneous Data Sources

20

Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes

Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB)

Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov

Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB

PubMed database >23M articles

Hospital information systems Often more than 50GB

Medical sensor data Scan of a single organ in 1s creates 10GB of raw data Cancer patient records

>160k records at NCT

When time matters...

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

■  Requirements

□  Managed services

□  Reproducibility

□  Real-time data analysis

■  Restrictions

□  Data privacy

□  Data locality

□  Volume of big medical data

Software Requirements in Life Sciences

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

21

http://stevedempsen.blogspot.de/2013/08/agile-software-requirements-comic.html

Project Time Line

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

22

2009 2010 2011 2012 2013 2014 2015

SAP HANA launched Oncolyzer SORMAS

Drug Response Analysis

Enterprise Software

Medical Knowledge

Cockpit

Analyze Genomes Platform

IMDB Research

2016 2017 A R

T +

T R A M

S + S

M

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data

23

In-Memory Database

When time matters...

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data

24

In-Memory Database

Combined and Linked Data

Genome Data

Cellular Pathways

Genome Metadata

Research Publications

Pipeline and Analysis Models

Drugs and Interactions

When time matters...

Indexed Sources

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data

25

In-Memory Database

Extensions for Life Sciences

Data Exchange, App Store

Access Control, Data Protection

Fair Use

Statistical Tools

Real-time Analysis

App-spanning User Profiles

Combined and Linked Data

Genome Data

Cellular Pathways

Genome Metadata

Research Publications

Pipeline and Analysis Models

Drugs and Interactions

When time matters...

Indexed Sources

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data

26

In-Memory Database

Extensions for Life Sciences

Data Exchange, App Store

Access Control, Data Protection

Fair Use

Statistical Tools

Real-time Analysis

App-spanning User Profiles

Combined and Linked Data

Genome Data

Cellular Pathways

Genome Metadata

Research Publications

Pipeline and Analysis Models

Drugs and Interactions

When time matters...

Drug Response Analysis

Pathway Topology Analysis

Medical Knowledge Cockpit Oncolyzer

Clinical Trial Recruitment

Cohort Analysis

...

Indexed Sources

Combined column and row store

Map/Reduce Single and multi-tenancy

Lightweight compression

Insert only for time travel

Real-time replication

Working on integers

SQL interface on columns and rows

Active/passive data store

Minimal projections

Group key Reduction of software layers

Dynamic multi-threading

Bulk load of data

Object-relational mapping

Text retrieval and extraction engine

No aggregate tables

Data partitioning Any attribute as index

No disk

On-the-fly extensibility

Analytics on historical data

Multi-core/ parallelization

Real-Time Data Analysis In-Memory Database Technology

+

+++

+

P

v

+++t

SQL

xx

T

disk

27

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

Managed Services provided by Federated In-Memory Database System (FIMDB)

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

28

Node i

Worker Worker Worker

IMDB

Node j

Worker Worker Worker

IMDB

Node k

Worker Worker Worker

IMDB

Scheduler

Node m

Worker Worker Worker

IMDB

Relay

Node n

Worker Worker Worker

IMDB ...

Cloud Service Provider (Shared Algorithms and Public Reference Data)

Hospital or Research Department (Sensitive/Patient Data)

VPN

UDP TCP

Shared File System (Pool) Shared File System (Pool)

...

Shared File System (Global)

From Raw Genome Data to Analysis

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

■  DNA Sequencing: Transformation of analogues DNA into digital format

■  Alignment: Reconstruction of complete genome with snippets

■  Variant Calling: Identification of genetic variants

■  Data Annotation: Linking genetic variants with research findings

29

Reproducibility Modeling of Data Analysis Pipelines 1.  Design time (researcher, process expert)

□  Definition of parameterized process model

□  Uses graphical editor and jobs from repository

2.  Configuration time (researcher, lab assistant)

□  Select model and specify parameters, e.g. aln opts

□  Results in model instance stored in repository

3.  Execution time (researcher)

□  Select model instance

□  Specify execution parameters, e.g. input files

When time matters...

Dr. Schapranow, Festival of Genomics, Jan 31, 2017 30

App Example: Cloud-based Services for Processing of DNA Data

■  Control center for processing of raw DNA data, such as FASTQ, SAM, and VCF

■  Personal user profile guarantees privacy of uploaded and processed data

■  Supports reproducible research process by storing all relevant process parameters

■  Implements prioritized data processing and fair use, e.g. per department or per institute

■  Supports additional service, such as data annotations, billing, and sharing for all Analyze Genomes services

■  Honored by the 2014 European Life Science Award

When time matters...

Standardized Modeling and runtime environment for

analysis pipelines

31

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

Real-time Data Analysis and Interactive Exploration

App Example: Identification of Optimal Chemotherapy

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

Smoking status, tumor classification

and age (1MB - 100MB)

Raw DNA data and genetic variants

(100MB - 1TB)

Medication efficiency and wet lab results

(10MB - 1GB)

32

Patient-specific Data

Tumor-specific Data

Compound Interaction Data

■  Honored by the 2015 PerMediCon Award

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

33

Showcase

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

34 Calculating Drug Response… Predict Drug Response

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

35 cetuximab might be more

beneficial for the current case

■  Query-oriented search interface

■  Seamless integration of patient specifics, e.g. from EMR

■  Parallel search in international knowledge bases, e.g. for biomarkers, literature, cellular pathway, and clinical trials

App Example: Medical Knowledge Cockpit for Patients and Clinicians

When time matters...

36

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

Medical Knowledge Cockpit for Patients and Clinicians Pathway Topology Analysis

■  Search in pathways is limited to “is a certain element contained” today

■  Integrated >1,5k pathways from international sources, e.g. KEGG, HumanCyc, and WikiPathways, into HANA

■  Implemented graph-based topology exploration and ranking based on patient specifics

■  Enables interactive identification of possible dysfunctions affecting the course of a therapy before its start

When time matters...

Unified access to multiple formerly disjoint data sources

Pathway analysis of genetic variants with graph engine

37

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

■  Interactively explore relevant publications, e.g. PDFs

■  Improved ease of exploration, e.g. by highlighted medical terms and relevant concepts

Medical Knowledge Cockpit for Patients and Clinicians Publications

When time matters...

38

Real-time Assessment of Clinical Trial Candidates

■  Switch from trial-centric to patient-centric clinical trials

■  Real-time matching and clustering of patients and clinical trial inclusion/exclusion criteria

■  No manual pre-screening of patients for months: In-memory technology enables interactive pre-screening process

■  Reassessment of already screened or already participating patient reduces recruitment costs

When time matters... Assessment of patients

preconditions for clinical trials

39

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

■  For patients

□  Identify relevant clinical trials and medical experts

□  Become an informed patient

■  For clinicians

□  Identify pharmacokinetic correlations

□  Scan for similar patient cases, e.g. to evaluate therapy efficiency

■  For researchers

□  Enable real-time analysis of medical data, e.g. assess pathways to identify impact of detected variants

□  Combined mining in structured and unstructured data, e.g. publications, diagnosis, and EMR data

What to Take Home? Learn more and test-drive it yourself: AnalyzeGenomes.com

Dr. Schapranow, Festival of Genomics, Jan 31, 2017 40

When time matters...

Keep in contact with us!

Dr. Schapranow, Festival of Genomics, Jan 31, 2017

When time matters...

41

Dr.-Ing. Matthieu-P. Schapranow Program Manager E-Health & Life Sciences

Hasso Plattner Institute

August-Bebel-Str. 88 14482 Potsdam, Germany

[email protected]

http://we.analyzegenomes.com/