Friend NIEHS 2013-03-01
Transcript of Friend NIEHS 2013-03-01
If not
Integrating genomes and networks to understand health and disease
Examples of being Naive:
Expression Profiles
2000
Examples of being Naive:
DNA Alterations
Examples of being Naive:
Synthetic Lethal Screens
Examples of being Naieve:
Drugs and Trials
PARP IGF1-R m-TOR VEGF-R Wee-1
Reality: Overlapping Pathways
• alchemist
How often are we hurt by going from the particular to the general
in very complex systems driven by context?
Is this going from the particular to the general a central problem in
Hypothesis Driven Biomedical Research?
How often do we inappropriately praise findings that go on to have awkward adjacencies?
.
TENURE FEUDAL STATES
What could be done by us?
BUILDING PRECISION MEDICINE
Extensions of Current Institutions
Proprietary Short term Solutions
Open Systems of Sharing in a Commons
Massive amount of human “omic’s” and compound data
Network Modeling Approaches for Diseases are emerging
IT Infrastructure and Cloud compute capacity allows a generative open approach to solving problems
Nascent Movement for patients to Control Sensitive information allowing sharing
Open Social Media allows citizens and experts to use gaming to solve problems
1- Now possible to generate massive amount of human “omic’s” data 2-Network Modeling Approaches for Diseases are emerging 3- IT Infrastructure and Cloud compute capacity allows a generative open approach to biomedical problem solving 4-Nascent Movement for patients to Control Sensitive information allowing sharing 5- Open Social Media allows citizens and experts to use gaming to solve problems
A HUGE OPPORTUNITY -- A HUGE RESPONSIBILITY
We focus on a world where biomedical research is about to fundamentally change. We think it will be often conducted in an open, collaborative way where teams of teams far beyond the current guilds of experts will contribute to making better, faster, relevant discoveries
Better Models of
Disease:
KNOWLEDGE
NETWORK
Techn
olo
gy P
latform
Rewards/Challenges
Imp
actf
ul M
od
els
Governance
1) Identifying key disease systems and genes- Alzheimer’s Gaiteri et al.
Example “modules” of coexpressed genes, color-coded
1.) Identify groups of genes that move together – coexpressed “modules” - correlated expression of multiple genes across many patients
- coexpression calculated separately for Disease/healthy groups - these gene groups are often coherent cellular subsystems, enriched in one or more GO functions
1.) Identify groups of genes that move together – coexpressed “modules” 2.) Prioritize the disease-relevance of the modules by clinical and network measures
Prioritize modules through expression synchrony with clinical measures or tendency too reconfigure themselves in disease
vs
1) Identifying key disease systems and genes- Alzheimer’s
Infer directed/causal relationships and clear hierarchical structure by
incorporating eSNP information
(no hair-balls here) vs
Prioritize modules through expression synchrony with clinical measures or tendency too reconfigure themselves in disease
1) Identifying key disease systems and genes- Alzheimer’s
1.) Identify groups of genes that move together – coexpressed “modules” 2.) Prioritize the disease-relevance of the modules by clinical and network measures 3.) Incorporate genetic information to find directed relationships between genes
1) Identifying key disease systems and genes- Alzheimer’s Example network finding: microglia activation
Module selection – what identifies these modules as relevant to Alzheimer’s disease?
The eigengene of a module of ~400 probes correlates with Braak score, age, cognitive disease severity and cortical atrophy. Members of this module are on average differentially expressed (both up- and down-regulated).
Evidence these modules are related to microglia function
The members of this module are enriched with GO categories (p<.001) such as “response to biotic stimulus” that are indicative of immunologic function for this module. The microglia markers CD68 and CD11b/ITGAM are contained in the module (this is rare – even when a module appears to represent a specific cell-type, the histological markers may be lacking). Numerous key drivers (SYK, TREM2, DAP12, FC1R, TLR2) are important elements of microglia signaling.
Alzgene hits found in co-regulated microglia module:
Figure key:
Five main immunologic families found in Alzheimer’s-associated module Square nodes in surrounding network denote literature-supported nodes. Node size is proportional to connectivity in the full module.
(Interior circle) Width of connections between 5 immune families are linearly scaled to the number of inter-family connections.
Labeled nodes are either highly connected in the original network, implicated by at least 2 papers as associated with Alzheimer’s disease, or core members of one of the 5 immune families.
Core family members are shaded.
1) Identifying key disease systems and genes- Alzheimer’s
Transforming networks into biological hypotheses
1) Identifying key disease systems and genes- Alzheimer’s
Design-stage AD projects at Sage
Fusing our expertise in…
Join us in uniting genes, circuits and regions to build multi-scale biophysical disease models. Contact [email protected]
Diffusion Spectrum Imaging
Microcircuits & neuronal diversity
Gene regulatory networks
Feed
back
1) Identifying key disease systems and genes- Alzheimer’s
N=587 P<0.0001
N=944, P<0.0001
2) Identifying genetic biomarkers of statin response from
cellular expression changes in treated LCLs
Differential eQTL analysis
Identifying local “cis” acting genetic effects
Differential network analysis
Identifying “trans” acting genetic effects.
Genotypes
2M simvastatin
Control
Clinical simvastatin trial Cellular Simvastatin exposure
N=480
Lara Mangravite
AA AG GG
AA AG GG AA AG GG
Differential eQTL analysis identifies loci for which genetic association with gene expression is altered by statin treatment
Control Simvastatin Difference Control vs. Simvastatin
log10BF=0.52 log10BF=7.1* log10BF=5.7*
Diff-eQTL locus is associated with reduced incidence of statin-induced myopathy
Lara Mangravite
Differential network analysis:
Partial correlation, FDR=5% and PP>0.90
By integrating statin-mediated changes in gene correlation with eQTLs, we identify genes predicted to alter cholesterol homeostatis and lipoprotein metabolism.
Knockdown of candidate gene in hepatocytes confirms alterations in lipoprotein metabolism
78.1±8.0% gene knockdown, Huh7 cells
Lara Mangravite
(including one involved in creatine biosynthesis)
3) Classification of transporter-mediated hepatotoxicity
Bile Salt Exporter BSEP (Amgen)
AUC=0.98 5-fold crossvalidation
3. Development of classifier for predicting BSEP inhibition of unknown compounds
2. Classification of response to compounds by BSEP Inhibitor Status (rat IC50)
1. Characterization of differential expression following compound exposures in rat liver
4. Validation
Mangravite, Jang, Mecham, Derry
How It All Fits Together
45
DREAM Challenges
Synapse
Data Generation
BRIDGE Data
Activation
FEDERATION
On-Line Open Generative
Communities
Portable Legal Consent
2009-2010
Access to Data Sets
How It All Fits Together
46
DREAM Challenges
Synapse
Data Generation
BRIDGE Data Activation
FEDERATION
On-Line Open Generative
Communities
Portable Legal Consent
2010-2011
two approaches to building common scientific knowledge
Text summary of the completed project
Assembled after the fact
Every code change versioned
Every issue tracked
Every project the starting point for new work
All evolving and accessible in real time
Social Coding
TECHNOLOGY PLATFORM
Synapse is GitHub for Biomedical Data
• Data and code versioned
• Analysis history captured in real time
• Work anywhere, and share the results with anyone
• Social/Interactive Science
• Every code change versioned
• Every issue tracked
• Every project the starting point for new work
• Social/Interactive Coding
Data Analysis with Synapse
Run Any Tool
On Any Platform
Record in Synapse
Share with Anyone
“Synapse is a nascent compute platform for transparent, reproducible, and modular collaborative research.”
Currently at 16K+ datasets and ~1M models
Download analysis and meta-analysis
Download another Cluster Result Download Evaluation and view more stats
• Perform Model averaging
• Compare/contrast models
• Find consensus clusters
• Visualize in Cytoscape
Pancancer collaborative subtype discovery
Objective assessment of factors influencing model
performance (>1 million predictions evaluated)
Sanger CCLE Prediction accuracy
improved by…
Not discretizing data
Including expression data
Elastic net regression
130 compounds 24 compounds
Cro
ss v
alid
atio
n p
red
icti
on
acc
ura
cy (
R2)
In Sock Jang
How It All Fits Together
55
DREAM Challenges
Synapse
Data Generation
BRIDGE Data Activation
FEDERATION
On-Line Open Generative
Communities
Portable Legal Consent
2011-2012
(Nolan and Haussler)
THE FEDERATION
Schadt Ideker Friend Califano Nolan Vidal
How It All Fits Together
57
DREAM Challenges
Synapse
Data Generation
BRIDGE Data Activation
FEDERATION
On-Line Open Generative
Communities
Portable Legal Consent
2012-2013
Sage-DREAM Breast Cancer Prognosis Challenge #1 Building better disease models together
154 participants; 27 countries
334 participants; >35 countries
>500 models posted to Leaderboard
breast cancer data
Challenge Launch: July 17
Sep 26 Status
Caldos/Aparicio
How It All Fits Together
59
DREAM Challenges
Synapse
Data Generation
BRIDGE Data
Activation
FEDERATION
On-Line Open Generative
Communities
Portable Legal Consent
2012-2013
GOVERNANCE: PORTABLE LEGAL CONSENT Control of Private information by Citizens allows sharing
weconsent.us
John Wilbanks
• Online educational wizard • Tutorial video • Legal Informed Consent Document • Profile registration • Data upload
John Wilbanks TED Talk “Let’s pool our medical data” weconsent.us
How It All Fits Together
61
DREAM Challenges
Synapse
Data Generation
BRIDGE
Data Activation
FEDERATION
On-Line Open Generative
Communities
Portable Legal Consent
2012-2013
BRIDGE
BRIDGE
How It All Fits Together
64
DREAM Challenges
Synapse
Data Generation
BRIDGE
Data Activation
FEDERATION
On-Line Open Generative
Communities
Portable Legal Consent
2013-2014
IMPACT
virtual machine
A ‘clearScience’ way of
modeling PI3K pathway
activation in breast cancer
web-accessible
DATA
web-accessible
SOURCE CODE
web-accessible
PROVENANCE
web-accessible
MODEL
sage bionetworks
metaGenomics/pan-cancer project collaboration with david haussler @ ucsc for
“analysis-ready” tcga data
tcga breast RNAseq data
tcga breast exome seq data
R code for a pathway heuristic
random forest model of pi3k
activation
executable pi3k model
binary
world wide web consortium (w3c) specification PROVENANCE for
all the interconnections above
all of these elements can be housed in an
THE DREAM PROJECT JOINS
SAGE BIONETWORKS TO ENABLE
COLLABORATIVE SCIENCE
66
How to incent the joint evolution of ideas in a rapid learning space- prepublication?
How to fund where data generators and analysts are not always the same people- repeatedly?
Should we consider
Centralized Guilds and Distributed Dynamic Teams to perform gene-environment model building?
If not
SYNAPSE FEDERATION PORTABLE LEGAL CONSENT CHALLENGES BRIDGE CITIZEN ENGAGEMENT