Stephen Friend Norwegian Academy of Science and Letters 2011-11-02
-
Upload
sage-base -
Category
Health & Medicine
-
view
397 -
download
2
description
Transcript of Stephen Friend Norwegian Academy of Science and Letters 2011-11-02
Future of Genetics in Medicine
Stephen Friend MD PhD
Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ Amsterdam
The Norwegian Academy of Science and Letters, Oslo November 2, 2011
Why not use data intensive science to build models of disease
Current Reward Structures
Organizational Structures and Tools
Pilots
Opportunities
Alzheimers Diabetes
Autism Cancer Treating Symptoms v.s. Modifying Diseases
Will it work for me?
7
The Current Pharma Model is Broken:
• In 2010, the pharmaceutical industry spent ~$100B for R&D
• Half of the 2010 R&D spend ($50B) covered pre-PH III activities
• Half of the pre-PH III costs ($25B) were for program targets that at least one other pharmaceutical company was actively pursuing
• Only 8% of pharma company small molecule PCCs make it to PH III
• In 2010, only 21 new medical entities were approved by FDA
7
What is the problem?
• Regulatory hurdles too high? • Low hanging fruit picked? • Companies not large enough to execute on strategy? • Internal research costs too high? • Clinical trials in developed countries too expensive?
In fact, all are true but none is the real problem
What is the problem?
We need a better understand disease biology before testing proprietary compounds on sick patients
What is the problem?
Most approved therapies assumed indications represent homogenous populations
Our existing disease models often assume pathway knowledge sufficient to infer correct therapies
Personalized Medicine 101: Capturing Single bases pair mutations = ID of responders
Reality: Overlapping Pathways
The value of appropriate representations/ maps
Equipment capable of generating massive amounts of data
“Data Intensive” Science- Fourth Scientific Paradigm
Open Information System
IT Interoperability
Host evolving computational models in a “Compute Space”
WHY NOT USE“DATA INTENSIVE” SCIENCE
TO BUILD BETTER DISEASE MAPS?
what will it take to understand disease?
DNA RNA PROTEIN (dark ma>er)
MOVING BEYOND ALTERED COMPONENT LISTS
2002 Can one build a “causal” model?
trait
How is genomic data used to understand biology?
“Standard” GWAS Approaches Profiling Approaches
“Integrated” Genetics Approaches
Genome scale profiling provide correlates of disease Many examples BUT what is cause and effect?
Identifies Causative DNA Variation but provides NO mechanism
Provide unbiased view of molecular physiology as it
relates to disease phenotypes
Insights on mechanism
Provide causal relationships and allows predictions
RNA amplification Microarray hybirdization
Gene Index
Tum
ors
Tum
ors
21
Causal Inference
Constructing Co-expression Networks
Start with expression measures for ~13K genes most variant genes across 100-150 samples
Note: NOT a geneexpression heatmap
1 -0.1 -0.6 -0.8 -0.1 1 0.1 0.2
-0.6 0.1 1 0.8 -0.8 0.2 0.8 1
1
2
3
4
1 2 3 4
Correla9on MatrixBrain sample
expression
1 0 1 1 0 1 0 0 1 0 1 1 1 0 1 1 1
2
3
4
1 2 3 4
Connec9on Matrix
1 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 1
2
4
3
1 2 4 3
41
32
Establish a 2D correla9onmatrix for all gene pairs
Define Thresholdeg >0.6 for edge
Clustered Connec9on Matrix
Hierarchicallycluster
sets of genes for which manypairs interact (relaPve to thetotal number of pairs in thatset)
Network Module
Iden9fymodules
Constructing Bayesian Networks
Preliminary Probalistic Models- Rosetta- Eric Schadt
Gene symbol Gene name Variance of OFPM explained by gene expression*
Mouse model
Source
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11] C3ar1 Complement component
3a receptor 1 46% ko Purchased from Deltagen, CA
Tgfbr2 Transforming growth factor beta receptor 2
39% ko Purchased from Deltagen, CA
Networks facilitate direct identification of genes that are
causal for disease Evolutionarily tolerated weak spots
Nat Genet (2005) 205:370
Map compound signatures to disease networks
Sub-‐networkcontains genesassociatedwith toxici5es
Sub-‐network contains genesassociated with diabetes traits
Sub-‐network contains genesassociated with obesity traits
1
2
3
Compound 1: Drug signature significantly enriched in subnetwork associated with diabetes traits
Compound 2: Drug signature significantly enriched in subnetwork associated with obesity traits
Compound 3: Drug signature significantly enriched in subnetwork associated with obesity traits BUT also in subnetwork associated with toxicities
Compound Gene expression signatures
Tissue Disease Networks
"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
"Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
"Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)
"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008)
…… Plus 3 additional papers in PLoS Genet., BMC Genet.
Metabolic Disease
CVD
Bone
Methods
Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models
• >80 Publications from Rosetta Genetics
50 network papers http://sagebase.org/research/resources.php
List of Influential Papers in Network Modeling
(Eric Schadt)
Equipment capable of generating massive amounts of data A-
“Data Intensive” Science- Fourth Scientific Paradigm Score Card for Medical Sciences
Open Information System D-
IT Interoperability D
Host evolving computational models in a “Compute Space F
“hunter gathers”- not sharing
TENURE FEUDAL STATES
Clinical/genomic data are accessible but minimally usable
Little incentive to annotate and curate data for other scientists to use
Mathematical models of disease are not built to be
reproduced or versioned by others
Assumption that genetic alterations in human conditions should be owned
Publication Bias- Where can we find the (negative) clinical data?
Sage Mission
Sagebase.org
Data Repository
Discovery Platform
Building Disease Maps
Commons Pilots
Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the elimination of human disease
Sage Bionetworks Collaborators
Pharma Partners Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Johnson &Johnson
38
Foundations Kauffman CHDI, Gates Foundation
Government NIH, LSDF
Academic Levy (Framingham) Rosengren (Lund) Krauss (CHORI)
Federation Ideker, Califarno, Butte, Schadt
RULES GOVERN
PLAT
FORM
NEW
MAP
S NEW MAPS
Disease Map and Tool Users- ( Scientists, Industry, Foundations, Regulators...)
PLATFORM Sage Platform and Infrastructure Builders-
( Academic Biotech and Industry IT Partners...)
PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots-
(Federation, CCSB, Inspire2Live....)
NEW TOOLS Data Tool and Disease Map Generators- (Global coherent data sets, Cytoscape,
Clinical Trialists, Industrial Trialists, CROs…)
RULES AND GOVERNANCE Data Sharing Barrier Breakers-
(Patients Advocates, Governance and Policy Makers, Funders...)
Sage Datasets- 49+
40
Sage Neuro Collaborations
41
Neurodegenerative
• Huntington’s Disease : Marcy MacDonald/Jim Gusella (MGH) • $2.5M Grant to CHDI to fund generation of RNA-seq data for brain regions and peripheral tissues for well phenotyped cohorts (also methyl genome studies)
• HD/AD/PD : MacDonald/Gusella (MGH), Myers (BU), Paulsen (Iowa) • $6.8M RC4 grant to NIH to fund generation of SNP and RNA-seq data for brain regions from HD, AD, PD for well phenotyped cohorts
• Alzheimer’s Disease: Green (BU), Johnson (UWM) • Collaboration opportunity around longitudinal neuroimaging & cognitive phenotyped cohorts (ADNI, ADGC, etc). Intersect gene expression studies. Funding for further data generation
Psychiatric • Autism/ Schizophrenia- Consortium
Sleep & Stress • Genetics of Sleep: Turek (Northwestern)
• Collaboration with Turek lab & Merck focused on mouse. DARPA • Enabling Stress Resistance
• DARPA-funded collaboration with Turek lab to look at genetic and brain molecular mechanisms that regulate physical and emotional stress responses in mouse • Currently looking to expand to human
Alzheimer’s Disease
• Cross-‐Pssue coexpression networks for both normal and AD brains
– prefrontal cortex, cerebellum, visual cortex
• DifferenPal network analysis on AD and normal networks
• Integrate coexpression networks and Bayesian networks to idenPfy key regulators for the modules associated with AD
42
nerve ensheathment
Glutathione transferaseGain connecPvity by 91 fold
Lose connecPvity by 40%
Module ConnecPvity Change (AD/Normal)
IdenPficaPon of Disease (AD) Pathways via ComparaPveGene Network Analysis
40,000 genes from three Pssues
Bayesian Subnetworks
Control(PFC, CB, VC)
AD(PFC, CB, VC)
43
44 44
DifferenPally Connected Modules in AD
• Unfolded protein response (UPR) • AKT HIF1 VEGF
• Olfactory receptor acPvity • Sensory percepPon of smell chemical sPmulus
• Inflammatory Response
• Extra cellular matrix (ECM)
• SynapPc transmission (suppressed)
• Nerve ensheathment
44
Key RegulatorsPECAM1: Platelet-‐endothelial celladhesion molecule, a tyrosinephosphatase acPvator that plays arole in the platelet acPvaPon,increased expression correlateswith MS, Crohn disease, chronic B-‐cell leukemia, rheumatoid arthriPs,and ulceraPve coliPs
ENPP2: Phosphodiesterase I alpha,a lysophospholipase that acts inchemotaxis, phosphaPdic acidbiosynthesis, regulates apoptosisand PKB signaling; aberrantexpression is associated withAlzheimer type demenPa, majordepressive disorder, and variouscancers
SLC22A25: solute carrier family 22,member 25, Protein with highsimilarity to mouse Slc22a19, whichis a renal steroid sulfate transporterthat plays a role in the uptake ofestrone sulfate, member of thesugar (and other) transporter familyand the major facilitatorsuperfamily
Glutathione Transferase Module (Pink)
• 983 probes from all three brain regions (9% from CB, 15% from PFC and 76% from VC)• Most predicPve of Braak severity score
GlutathioneTransferase NerveEnsheathment ExtracellularMatrix
45
Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning
Leveraging Existing Technologies
Taverna
Addama
tranSMART
INTEROPERABILITY
INTEROPERABILITY
Genome Pattern CYTOSCAPE tranSMART I2B2
SYNAPSE
Watch What I Do, Not What I Say Reduce, Reuse, Recycle
Most of the People You Need to Work with Don’t Work with You
My Other Computer is Amazon
sage bionetworks synapse project
CTCAPArch2POCMThe FederaPonPortable Legal ConsentSage Congress ProjectAshoka/Sage MedXChange
Six Pilots at Sage Bionetworks
RULES GOVERN
PLAT
FORM
NEW
MAP
S
Clinical Trial Comparator Arm Partnership “CTCAP” Strategic Opportunities For Regulatory Science
Leadership and Action
FDA September 27, 2011
CTCAP
Clinical Trial Comparator Arm Partnership (CTCAP)
Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.
Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.
Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].
Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.
Started Sept 2010
Arch2POCM
Restructuring the PrecompePPveSpace for Drug Discovery
How to potenPally De-‐RiskHigh-‐Risk TherapeuPc Areas
The FederaPon
2008 2009 2010 2011
How can we accelerate the pace of scientific discovery?
Ways to move beyond “traditional” collaborations?
Intra-lab vs Inter-lab Communication
Colrain/ Industrial PPPs Academic Unions
human aging: predicting bioage using whole blood methylation
!
!!!
!
!!!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!!!!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!
!
!
40 50 60 70 80 90 100
40
60
80
10
0
Training Cohort: San Diego (n=170)
Chronological Age
Bio
log
ica
l A
ge
RMSE=3.35
!
!!
!
!
!!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
! !
!
!
!!
!
!
!
!
!
!!
!
!
!!
!
!!
!!
!
!
!
!!!
!!
!
!
!
!
!
!
!!
!!
!!
!
!!
!
!!
!
!!
!
!
!
!!!
!
!
!
! !
!
!
!!
!
!
!
!!
!
!
!!
! !
!!!
!
!
!
!!
!
!
!!
!
!
!!
40 50 60 70 80 90
40
60
80
10
0
Validation Cohort: Utah (n=123)
Chronological Age
Bio
log
ica
l A
ge
RMSE=5.44
• Independent training (n=170) and validation (n=123) Caucasian cohorts • 450k Illumina methylation array • Exom sequencing • Clinical phenotypes: Type II diabetes, BMI, gender…
sage federation: model of biological age
Faster Aging
Slower Aging
Clinical Association - Gender - BMI - Disease Genotype Association Gene Pathway Expression Pr
edictedAge
(liverexpression
)
Chronological Age (years)
Age Differential
Reproducible science==shareable science
Sweave: combines programmatic analysis with narrative
Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –
Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
Dynamic generation of statistical reports using literate data analysis
Federated Aging Project : Combining analysis + narraPve
=Sweave Vignette Sage Lab
Califano Lab Ideker Lab
Shared Data Repository
JIRA: Source code repository & wiki
R code + narrative
PDF(plots + text + code snippets)
Data objects
HTML
Submitted Paper
Portable Legal Consent
(AcPvaPng PaPents)
John Wilbanks
Sage Congress ProjectApril 20 2012
RAParkinson’sAsthma
(Responders CompePPons)
Ashoka/Sage
MedXChange
Why not use data intensive science to build models of disease
Current Reward Structures
Organizational Structures and Tools
Six Pilots
Opportunities
IMPACT ON PATIENTS
IMPACT ON PATIENTS
IMPACT ON PATIENTS