The Role of The Statisticians in Personalized Medicine: An Overview of Statistical Methods in...

59
The Role of Statisticians in Personalized Medicine: An Overview of Statistical Methods in Bioinformatics Setia Pramana STIS Jakarta, 8 August 2014 Setia Pramana 1

description

The Role of The Statisticians in Personalized Medicine: An Overview of Statistical Methods in Bioinformatics, Seminar in STIS Jakarta

Transcript of The Role of The Statisticians in Personalized Medicine: An Overview of Statistical Methods in...

Page 1: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 1

The Role of Statisticians in Personalized Medicine: An Overview of Statistical Methods in Bioinformatics

Setia Pramana

STISJakarta, 8 August 2014

Page 2: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 2

Outline• Drug Development• Personalized Medicine• Central Dogma• Microarray Data Analysis• Next Generation Sequencing• Summary

Page 3: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 3

Drug Developments• Takes 10-15 years• Cost millions USD• Who: Pharmaceutical, biotechnology, device companies,

Universities and government research agencies• Regulatory: The US Food and Drug Administration, BP POM• Evaluate:

– Safety – can people take it?– Efficacy – does it do anything in humans?– Effectiveness – is it better or at least as good as what is

currently available?– Do the benefits outweigh the risks?

Page 4: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 4

Drug Development• The Stages:

- Drug Discovery- Pre-clinical Development- Clinical Development 4 Phases

• Statisticians are involved in all stages• Stages are highly regulated• Result is based on most of patients• But .. Patients are created differently!

Page 5: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 5

Patients Heterogeneity

Page 6: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 6

Patients Heterogeneity• We’re all different in

- Physiological, demographic characteristics- Medical history- Genetic/genomic characteristics

• What works for a patient with one set of characteristics might not work for another!

Page 7: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 7

Patients Heterogeneity• “One size does not fit all”• Use a patient’s characteristics to determine best

treatment for him/her• Genomic information is a great potential

-- > Personalized medicine:“The right treatment for the right patient at the right

time”

Page 8: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 8

Personalized Medicine

• The ability to determine an individual's unique molecular characteristics and to use those genetic distinctions to diagnose more finely an individual's disease, select treatments that increase the chances of a successful outcome and reduce possible adverse reactions.

• Personalized medicine also is the ability to predict an individual's susceptibility to diseases and thus to try to shape steps that may help avoid or reduce the extent to which an individual will experience a disease

Page 9: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 9

Subgroup Identification and Targeted Treatment

• Determine subgroups of patients who share certain characteristics and would get better on a particular treatment

• Discover biomarkers which can identify the subgroup• Focus on finding and treating a subgroup

Page 10: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 10

Subgroup Identification and Targeted Treatment

Genotype Phenotype Intervention Outcome

Mutations/SNPGene/Protein ExpressionEpigenetics

DiseasesDisabilityEtc.

DrugsTherapiesRegimes

Personalized medicine

Page 11: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 11

Advanced Biomedical Technologies• High-throughput microarrays and molecular imaging

to monitor SNPs, gene and protein expressions• Next-Generation Sequencing

Page 12: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 12

First…. Bit Biology

Page 13: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

13

Central Dogma

http://compbio.pbworks.comSetia Pramana

Page 14: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 14

Gene• The full DNA sequence of an organism is called its

genome• A gene is a segment that specifies the sequence of

one or more protein.

Page 15: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 15

Genomics • The study of all the genes of a cell, or tissue, at :– the DNA (genotype), e.g., GWAS SNP, CNV etc…– mRNA (transcriptomics), Gene expression,– or protein levels (proteomics).

• Functional Genomics: study the functionality of specific genes, their relations to diseases, their associated proteins and their participation in biological processes.

Page 16: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 16

Microarrays

Page 17: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 17

Microarray

• DNA microarrays are biotechnologies which allow the monitoring of expression of thousand genes.

Page 18: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 18

Applications• High efficacy and low/no side effect drug• Genes related disease.• Biological discovery– new and better molecular diagnostics– new molecular targets for therapy– finding and refining biological pathways

• Molecular diagnosis of leukemia, breast cancer, etc.• Appropriate treatment for genetic signature• Potential new drug targets

Page 19: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 19

Microarray

Overview of the process of generating high throughput gene expression data using microarrays.

Page 20: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

20

Pipeline• Experiment design Lab work Image processing • Signal summarization (RMA, GCRMA)• Normalization • Data Analysis:

– Differentially Expressed genes– Clustering– Classification– Etc.

• Network / Pathways (GSEA etc..) • Biological interpretations

Setia Pramana

Page 21: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 21

Microarray Data Structure

Page 22: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 22

Preprocessed DataGenes C1 C2 C3 T1 T2 T3

G8522 6.78 6.55 6.37 6.89 6.78 6.92G8523 6.52 6.61 6.72 6.51 6.59 6.46G8524 5.67 5.69 5.88 7.43 7.16 7.31G8525 5.64 5.91 5.61 7.41 7.49 7.41G8526 4.63 4.85 5.72 5.71 5.47 5.79G8528 7.81 7.58 7.24 7.79 7.38 8.60G8529 4.26 4.20 4.82 3.11 4.94 3.08G8530 7.36 7.45 7.31 7.46 7.53 7.35G8531 5.30 5.36 5.70 5.41 5.73 5.77G8532 5.84 5.48 5.93 5.84 5.73 5.75

Page 23: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 23

Challenges• Mega data, difficult to visualize• Too few records (columns/samples), usually < 100 • Too many rows(genes), usually > 10,000• Too many genes likely leading to False positives• For exploration, a large set of all relevant genes is

desired• For diagnostics or identification of therapeutic

targets, the smallest set of genes is needed• Model needs to be explainable to biologists

Page 24: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 24

Type of Microarray Data Analysis

• Gene Selection–find genes for therapeutic targets

• Classification (Supervised)– identify disease (biomarker study)–predict outcome / select best treatment

• Clustering (Unsupervised)–find new biological classes / refine existing ones–Understanding regulatory relationship/pathway–exploration

Page 25: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 25

Gene Selection• Modified t-test• Significance Analysis of Microarray (SAM)• Limma (Linear model for microarrays )• Linear Mixed model• Logistics Regression• Lasso (least absolute selection and shrinkage operator)• Elastic-net• Etc,

Page 26: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 26

Visualization• Dimensionality reduction• PCA (Principal Component Analysis)• Biplot• Heatmap• Multi dimensional scaling• Etc

Page 27: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 27

Clustering• Cluster the genes• Cluster the

arrays/conditions• Cluster both simultaneously

• K-means• Hierarchical• Biclustering algorithms

Page 28: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 28

Clustering

• Cluster or Classify genes according to tumors

• Cluster tumors according to genes

Page 29: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics
Page 30: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 30

Classification• Linear Discriminant Analysis• K nearest Neighbor• Logistic regression• L1 Penalized Logistic Regression• Neural Network• Support Vector Machines• Random forest• etc

Page 31: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Aim: To improve understanding of host protein profiles during disease progression especially in

children.

Page 32: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Classification of Malaria Subtypes

•Identify panel of proteins which could distinguish between different subtypes.•Implement L1-penalized logistic regression

Page 33: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Penalized Logistic Regression

•Logistic regression is a supervised method for binary or multi-class classification.•In high-dimensional data (e.g., microarray): More variables than the observations Classical logistic regression does not work.•Other problems: Variables are correlated (multicolinierity) and over fitting.•Solution: Introduce a penalty for complexity in the model.

35

Page 34: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Penalized Logistic RegressionLogistic model:

Maximize the log-likelihood:

•-Penalization (Lasso):

36

Page 35: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

• Shrinks all regression coefficients () toward zero and set some of them to zero.

• Performs parameter estimation and variable selection at the same time.

• The choice of λ is crucial and chosen via k-fold cross-validation procedure.

• The procedure is implemented in an R package called penalized.

37

L1 Penalized Logistic Regression

Page 36: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Classification of Severe Malaria Anemia vs. Uncomplicated Malaria group

38

AUC: 0.86

Page 37: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 39

Dose-response Microarray Studies

Page 38: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 40

Dose-response Microarray Studies

Implemented in R package IsoGene and IsoGeneGUI.

Page 39: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 41

Dose-response Microarray Studies

Page 40: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 42

Gene Signature for Prostate Cancer

Page 41: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 43

Gene Signature for Prostate Cancer

Page 42: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 44

Gene Signature for Prostate Cancer

Page 43: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 45

Next Generation Sequencing

Page 44: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 46

Next Generation Sequencing

Reading the order of bases of DNA fragments

Page 45: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 48

NGS used for:• Whole genome re-sequencing• Metagenomics• Cancer genomics• Exome sequencing (targeted)• RNA-sequencing• Chip-seq• Genomic Epidemiology

Page 46: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 49

Next Generation Sequencing

• Produce Massive Data and fast• Problem is storage and analysis

Page 47: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

RNA-seq Pipeline

• Align to a reference genome using Tophat.

Reference

Pramana, et.al 50NBBC 2013Source: Trapnell et.al, 2010

Page 48: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

RNA-seq Pipeline

• Measure gene expression using Cufflinks: FPKM (Fragments Per Kilobase of transcript per Million mapped reads).

Reference Gene

Transcript 2Transcript 1

Isoform/Transcript FPKM

Gene FPKM

Sample 1

Sample 2

Sample 3

Pramana, et.al 51NBBC 2013 Source: Trapnell et.al, 2013

Page 49: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 52

Page 50: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 53

Subtype-specific Transcripts/Isoforms• Breast invasive carcinoma (BRCA) from the Cancer

Genome Atlas Project (TCGA).• 329 tumor samples.• Platform: illumina• Paired-end reads (length 50 bp).• 20 -100 million reads

Page 51: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Subtype-specific Transcripts/Isoforms• To discover transcripts/isoforms which are only

significantly (high/low) expressed in a certain cancer subtype.

Pramana, et.al 54NBBC 2013

Page 52: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Analysis Flow329 samples TCGA

Discovery set179 samples

Validation set- TCGA 150 samples- External samples

Classification to mol-subtypes- Use Swedish microarray data as

training data.- Based on gene level FPKM- Median and variance normalization- K-nearest neighbor- Classifier genes selection

Subtype-specific Transcript- Transcript level FPKM of all

genes- For each transcript: Robust

contrast tests.- Multiple testing adjustment.

Pramana, et.al 55NBBC 2013

Page 53: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 56

Subtype-specific Transcripts/Isoforms

Page 54: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 57

Subtype-specific Transcripts/Isoforms

Page 55: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 58

Subtype-specific Transcripts/Isoforms

Page 56: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 59

Software?• R now is growing, especially in bioinformatics– Statistics, data analysis, machine learning– Free– High Quality– Open Source– Extendable (you can submit and publish your own package!!)– Can be integrated with other languages (C/C++, Java, Python)– Large active user community– Command-based (-)

Page 57: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 60

My Current Research• Integration of Somatic Mutation, Expression and Functional

Data Reveals Potential Driver Genes Predictive of Breast Cancer Survival (KI, Ewha Univ, Brescia Univ).

• Molecular Subtyping of Breast Cancers using RNA-Sequence Data (KI, Ewha Univ, Brescia Univ).

• The genomic surveillance of drug-resistant tuberculosis (FKUI, NUS).

• Genomics screening for prostate cancer (KI)• Molecular subtyping of Malaria (KI, Scilab, Eijkman Inst.)• Health Technology Assessment (FKUI, Depkes)

Page 58: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 61

Summary• Statistics plays important roles in developing

personalized medicine• Multidisciplinary field need collaboration with

different experts. • Bioinformaticians is one of the sexiest job• Big Data in Medicine: Numerous opportunities to be

explored and discovered.

Page 59: The Role of The Statisticians in Personalized Medicine:  An Overview of Statistical Methods in Bioinformatics

Setia Pramana 62

Thank you for your attention….