seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture...

71
Towards the Precision Medicine Era: Computational challenges Ron Shamir, CS, TAU Fall 2016 seminar http://www.cs.tau.ac.il/~rshamir/seminar/16/precmedsem16.html

Transcript of seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture...

Page 1: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Towards the Precision

Medicine Era: Computational challenges

Ron Shamir, CS, TAU

Fall 2016 seminar

http://www.cs.tau.ac.il/~rshamir/seminar/16/precmedsem16.html

Page 2: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Lecture 1 Outline • A little bit of biology • Gene expression • Protein-protein networks • Protein-DNA networks • Functional enrichment • About the seminar • Your opportunity to ask lots of

questions!!!

2

Page 3: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

A little bit of biology very

3

Page 4: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Chromosomes

4

Page 6: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

DNA and Chromosomes •DNA: 4 bases molecule: ACGT

•Chromosome: contiguous stretch of DNA

•Genome: totality of DNA material

6

Page 7: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Proteins: The Cellular Machines

7

Page 9: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

DNA RNA protein

transcription translation

The hard disk

One program

Its output

9

Page 10: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

© Ron Shamir

The busy chef

• The profile of the cell: which genes are expressed as mRNAs and at what quantities.

10

20,000 recipes 10,000 dishes, in different quantities

Cooking 10,000 dishes

DNA RNA protein

Page 11: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Biology and Computation

11

Page 13: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Complexity • ~3,000,000,000 letters in the genome • 2,278,100 letters in the Bible • => one genome = a stack of ~ 1,000 Bibles

• ~20,000 genes in the genome • Hard to identify • Harder to figure their function • Even harder to figure how they work together

13

Page 14: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Enter Bioinformatics • The marriage of CS and Biology • Responds to the explosion of biological data,

and builds on the IT revolution

14

Page 15: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

September 15 2016: 220,731,315,250

bases

Biology is becoming an information science 15

Page 16: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Gene Expression

16

Page 17: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

• Find out the function of genes/proteins • Understand gene regulation • Figure out how genes, proteins interact:

Gene networks, development, … • Understand human DNA variations • Figure out the medical implications of all

the above • Research driven by new genome-wide high

throughput technologies • Key computational challenge: integration

Now that we know the human genome sequence, what’s next?

17

Page 18: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

DNA chips / Microarrays • Simultaneous measurement

of expression levels of all genes.

• Perform 105-106 measurements in one experiment

• Allow global view of cellular processes.

18

Measured now primarily by deep sequencing (NGS) Up to 1010 bases in one experiment

Page 19: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

The Raw Data

gene

s Expression levels,

“Raw Data”

experiments Entries of the Raw Data matrix: Ratios/absolute values/…

• expression pattern for each gene • Profile for each experiment /condition/sample/chip

Needs normalization!

19

Page 20: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

GEO

20

Nearly 2 million expression profiles All publicly available, well organized A vast, underutilized resource. © Ron Shamir

Page 21: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Protein interaction networks

21 © Ron Shamir

Page 22: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Protein-protein interactions (PPIs)

• Low throughput measurements: accurate, scarce

• High throughput: more abundant, noisy • Large, readily available resource

© Ron Shamir 22

Page 23: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

In fact, many resources

© Ron Shamir 23

Page 24: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

© Ron Shamir 24

Protein-DNA interactions

Page 25: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Regulation of Transcription

• A gene’s ranscription regulation is mainly encoded in the DNA in a region called the promoter

• Each promoter contains several short DNA subsequences, called binding sites (BSs) that are bound by specific proteins called transcription factors (TFs)

© Ron Shamir 25

TF TF Gene 5’ 3’

BS BS

Page 26: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Regulation of Transcription (II)

• By binding to a gene’s promoter, TFs promote or repress the recruitment of the transcription machinery

• The conditions that govern a gene’s transcription are determined by the specific combination of BSs in its promoter

© Ron Shamir 26

Gene 1

Gene 2

Page 27: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Modeling TF binding sites: Position Weight Matrix (PWM)

0 0.2 0.7 0 0.8 0.1 A

0.6 0.4 0.1 0.5 0.1 0 C

0.1 0.4 0.1 0.5 0 0 G

0.3 0 0.1 0 0.1 0.9 T

© Ron Shamir 27

ATGCAGGATACACCGATCGGTA 0.0605 GGAGTAGAGCAAGTCCCGTGA 0.0605 AAGACTCTACAATTATGGCGT 0.0151

Score: product of base probabilities. Need to set score threshold for hits.

Page 28: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Protein-DNA interactions

• Can be predicted using PWMs (look for hits in the promoters)

• Can be measured experimentally (ChIP-chip, ChIP-seq, PBM,…)

• The result in all cases: for each TF – a list of gene targets

• Presentable as a network • We often combine the PPI

and the PDI networks © Ron Shamir

28

Page 29: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

The hairball syndrome

© Ron Shamir 30

Page 30: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Goal

• Challenge: Detect active functional modules: connected subnetwork of proteins whose genes are co-expressed

• “Where is the action in the network in a particular experiment?”

© Ron Shamir 31

Page 31: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

© Ron Shamir 32

Ron Shamir, RNA Antalia, April 08 32

Page 32: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

© Ron Shamir 33

Page 33: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Functional Enrichment

© Ron Shamir 34

Page 34: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

What is the Gene Ontology?

• Set of biological phrases (terms) which are applied to genes: – protein kinase – apoptosis – membrane

24th Feb 2006 Jane Lomax

Page 35: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

What is the Gene Ontology?

• Genes are linked, or associated, with GO terms by trained curators at genome databases – known as ‘gene associations’ or GO

annotations

• Allows biologists to make inferences across large numbers of genes without researching each one individually

Page 36: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

GO structure

gene A

Page 37: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Clark et al., 2005

part_of

is_a

Page 38: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Clark et al., 2005

part_of

is_a

Page 39: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Reminder: Hypergeometric score • Urn with N balls of which

m are red. • Draw n balls at random

w/o replacement • X = no. of red balls drawn

−−

==

nN

knmN

km

kXP )(

'( , , , ) ( ')

k kHG N m n k P X k

= =∑

P-value for the chance that draw is random measures

enrichment © Ron Shamir 40

Page 40: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

GO Enrichment

• I have a set of genes/proteins. Is it enriched for a particular function?

• One function: use Hypergeometric p-val • Testing all function: use HG but correct

for multiple testing (Bonferroni/FDR)

© Ron Shamir 41

Page 41: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

The seminar

42

Page 42: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Guidelines

• You will need to dig deeply for the methods: supplements (on journal websites), previous papers,..

• See seminar website for resources • (re)start with the basics: definitions,

examples • Papers contain more than you can cover: Select your presentation focus wisely

© Ron Shamir 43

Page 43: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Guidelines (2)

• Provide intuition and examples to motivate your method

• Add something original that you thought of (and don’t hide that!)

• Focus more on the algorithms than on the results (rule of thumb: 60-40)

© Ron Shamir 44

Page 44: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Planning your presentation • Start: 3:10, Break 4-4:10, Talk End: 4:40,

followed by 5 min for questions, then open discussion

• Use mostly slides, and the board sparingly • Rehearse your talk! • Make contingencies in case you’re out of time • In the end, summarize the paper, repeating

the main results. Discuss strengths, weaknesses, steps ahead.

© Ron Shamir 45

Page 45: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

The questionnaire

• Prepare a short (4-5 item) questionnaire on the paper

• Level should basic, but require reading the paper

• Distribute it to students after the seminar

• Students will bring in their answers next week, and you will grade them.

© Ron Shamir 46

Page 46: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

:קביעת הציון הסופי

35%: הבנת החומר• 35%: הצגת החומר• 10%: בחירה טובה איזה חומר להציג•): שיחות ודפי שאלות(השתתפות פעילה בסמינר •

20% 10%: בונוס על מקוריות• !!. 10%-: חריגה מהזמן•

© Ron Shamir 47

Page 47: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Lecture 2 - Outline

• Precision medicine • One story • Your opportunity to ask lots of

questions!!!

48

Page 48: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Precision medicine

Page 49: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Precision medicine • Precisely tailoring therapies to subcategories

of disease, often defined by genomics • Unlike “personalized medicine”, avoids the

(mis)interpretation of per-patient drug development

• Medicine has always been personalized – the difference is new biomedical technologies

The Precision Medicine Initiative: A New National Effort Euan A. Ashley, JAMA. 2015;313(21):2119-2120. doi:10.1001/jama.2015.3595.

Page 50: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Problems with current medicine • Even for successful drugs, effect may be

achieved by a minority of the cohort • High NNT: ave number of patients needed to

treat to help one patient (often >10 in drug; >50 in prevention)

Page 51: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

© Ron Shamir 52

Page 52: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

© Ron Shamir 53

Page 53: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

© Ron Shamir 54

Page 54: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

PM and Genetic disease • Cystic Fibrosis: mutated

chloride channel. Ivacaftor drug helps in case the channel reaches the cell surface. The subclass of patients that can benefit from it was identified by a mutation.

• Six mutation-dependent categories identified Towards Precision Medicine Euan A. Ashley, Nat Rev Genetics 16

Page 55: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

PM and Genetic disease (2) • Precision oncology: Identifying and targeting

diseased pathways expressed in a tumor may help more than histology. “A better microscope”

• Study suggested that in 96% of undiagnosed primary tumors a genomic alteration could be identified and that in 85% of cases, it is potentially treatable by a known drug.

Page 56: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

PM and Genetic disease (3) • clopidogrel highly successful for heart attack

prevention during surgery, but prescribing required prior testing for mutations in CYP2C19

• Prevention! Screening high-risk families for relevant mutations can be cost-effective and life saving

Page 57: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Examples of

precision medicine

Page 58: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Pharmacogenomics • Avoid the “one size fits all” in drug prescription • The use of genomic information to individualize drug

prescribing • Pharmacology + Genomics • Analyze how the genetic makeup of a person affects

his/her drug response • Develop effective, safe medications and doses that

will be tailored to a person's genetic makeup • Most genetic tests are now done after diagnosis and

delay prescription – in the future: preemptive testing • CPIC maintains a list of gene variants and actionable

drugs

Page 59: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

The inevitable conclusion • To improve medicine and make it more

precise and personal, we need to know the genome sequence of the individual and his/her medical history.

• To make use of such information we first need to collect such data on many patients and analyze it seriously

• The time is ripe to do it!

© Ron Shamir 60

Page 60: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Projects around the world • US precision medicine initiative: Assembly of a 1M

cohort of individuals willing to share their electronic medical record data and genomic data. – 1st generation: data from genotyping chips containing

1-2 million SNPs or enhanced exome sequencing. – 2nd generation: genome sequencing

Page 61: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Projects around the world (2) • United Kingdom 100,000 Genomes Project

Page 62: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Projects around the world (3) • United Kingdom and Denmark already have large-scale

biobanks. • US Million Veteran Program reports recruitment currently at more

than 300 000 individuals, with thousands having been sequenced and hundreds of thousands having been genotyped.

• USA eMERGE consortium combines electronic medical record data and genomic data from almost 200 000 individuals.

• USA Global Alliance for Genomics and Health aims for the establishment of a common framework of harmonized approaches for effective and responsible sharing of genomic and clinical data.

• National Human Genome Research Institute created the Electronic Medical Records and Genomics Network, which now includes 10 EHR-based DNA repositories and >350 000 subjects

Page 63: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Projects around the world (4) • 23andme collected data from ~1 million individuals willing to

contribute their time and DNA to research. • Regeneron partnered with the Geisinger Health System to

connect the exome sequence with EMR data from hundreds of thousands of patients.

• Kaiser Permanente Northern California Research Program on Genes, Environment and Health biobank18 included ~200K consented subjects with saliva or blood samples linked to comprehensive longitudinal EHR data and self-reported demographic and behavioral information. A subset of 110K+ of these individuals have genome-wide genotype and telomere length data available, forming the Genetic Epidemiology Research on Adult Health and Aging cohort (2014 numbers)

Page 64: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Electronic Health Records (EHRs) • Created and maintained by HMOs, hospitals and clinical practice

environments. • The EHR is a mix of structured and narrative text data. • Structured data: billing codes, laboratory tests, medication

prescriptions, and certain standardized document elements (eg, height, weight, vital signs, problem lists).

• EHR billing codes: – diagnosis-related groups to categorize hospitalizations – International Classification of Disease: ICD codes to describe

diagnoses and morbidities – Current Procedural Terminology codes to describe procedures.

• Narrative or text data provider notes, especially those portions entered as “free” or unstructured text (the bulk of the data). Can be structured by NLP.

• Scanned data in analog form, e.g. radiographic images, scanned text documents. Cannot easily be searched for content.

Page 65: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Mobile health • Mobile wearable devices can measure people’s

activity and other factors continuously and accurately.

• A natural target: physical fitness. Easily measureable and a greater risk factor for all-cause mortality than smoking, diabetes, and obesity

• MyHeartCounts: cardiovascular mobile health study; recruited 30 000 smartphone users in 2 weeks

Page 66: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

What to sequence per individual • Gene panel: capture and sequence selected genes (a few

dozens to a few hundreds) at great coverage (~100x) • Exome sequencing: the exons and regulatory regions

(~10mb, 10-50x) • Whole genome sequencing (WGS, 30x)

– Processing requires 1T – Final VCF file: 1G

• Tradeoffs: cost, speed, sensitivity, clinical standards • Storage and analysis challenges!! the data size of

genomics will soon surpass that of online video and particle physics

Pukelwartz, Supercomputing for the parallelization of whole genome analysis, Bioinformatics 2014 Stephens, Z. D. et al. Big Data: astronomical or genomical? PLOS Biol. 13, e1002195 (2015).

Page 67: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

The actionable genome • 1500-2000 drugs

FDA-approved to date • Most drugs have a specific

protein that they target or are otherwise linked to.

• In that case we say that the gene is druggable or actionable

• No of druggable genes: ~4500

http://www.raps.org/Regulatory-Focus/

Page 68: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Mutation types • Somatic vs germline • SNV: single nucleotide variation • Indel: insertion or deletion • CNV: copy number variation • SV: structural variation

Page 69: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Somatic mutation frequencies

Lawrence Getz Mutational heterogeneity in cancer Nature 13

Page 70: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Many other issues • Security and privacy:

– Need to maintain data security, patient privacy – De-identification of the data works to some extent – but a full genome uniquely identifies the individual

• Need informed consent of the individual to use data

• Should the person be informed on his/her results • EHRs: noisy, incomplete, many biases • Genomic data: still lacks clinical-level standards

Page 71: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks

Sources • Ashley, The Precision Medicine Initiative,

JAMA 2015 • Ashley, Towards precision medicine. Nature

Rev Genetics Sept 2016 • Hall et al. Merging Electronic Health Record

Data and Genomics for Cardiovascular Research, Circ Cardiovasc Genet April 2016