A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director...

29
A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute

Transcript of A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director...

Page 1: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

A Connected Digital Biomedical Research Enterprise with Big Data

Belinda Seto, Ph.D.

Deputy Director

National Eye Institute

Page 2: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

What is it?

Digital research assets: data, workflow, publications, software

To connect these assets Unique identifiers or tags Annotation Community-developed standards Interfaces

Page 3: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Benefits

Increase scientific productivity Enhance collaborations Foster creativity: new tools,

algorithms, methods, modeling Enable new discoveries Improve interoperability Facilitate reproducibility

Page 4: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Gene Expression DataGene Expression Data

Barrett T et al. Nucl. Acids Res. 2013;41:D991-D995Published by Oxford University Press 2012.

VolumeVelocityVariety

Page 5: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Gene Expression Omnibus

A public repository (NLM) of microarray, next generation sequencing and functional genomic data

Web-based interface and apps for query and data download

Page 6: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Myriad Data Types

Other ‘Omic

Imaging Phenotypic

Clinical

Genomic

Exposure

Page 7: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Making Big Data Functional

Engender interdisciplinary approach to data collection and analysis by integrating scientific, algorithmic, and computational work

Drive functional data collection and analysis that has practical value in determining risk alleles

Page 8: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Integration of Data

Opportunities: Understanding biology across scales, from molecules to population

Challenges: need access to primary data and processed data, machine-readable metadata, tools to reduce dimensionality

Page 9: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Integration of Disparate Data Types: Brain Images with

Genomic

Page 10: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Brain measures versus epidemiological studies to find genetic variants that directly affect the brain

DIFFICULT

EASIER?

May require 10,000-30,000 people e.g., the Psychiatric Genetics Consortium studies

Gene variants (SNP’s) may affect brain measures directly, many brain measures relate to disease status.

Page 11: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Finding Genetic Variants Influencing Brain Structure

CTAGTCAGCGCTCTAGTCAGCGCT

CTAGTCAGCGCTCTAGTCAGCGCT

CTAGTAAGCGCTCTAGTAAGCGCT

CTAGTAAGCGCTCTAGTCAGCGCT

SNP

C/C A/C A/A

Intr

acra

nial

Vol

ume

Phenotype Genotype Association

Page 12: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Genome-Wide Association Studies (GWAS)

Identify loci for phenotypes or diseases using genotyping arrays throughout entire genome

Study association of polymorphisms with complex human traits

Meta-analysis across multiple studies

Page 13: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

One SNP“Candidate gene” approach

e.g., BDNF

Screening 500,000 SNPs – 2,000,000 SNPs

Position along genome

NIH-funded database of genotypes and phenotypes enabling searches to find where in the genome a

variant is associated with a trait.

Genome-wide Association Study

-log

10(P

-val

ue)

C/C A/C A/A

Intr

acra

nial

Vol

ume

Page 14: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Applications of GWAS

Identify genetic variants that affect brain measures: volumetric, fiber integrity, connectivity

Risk genes Early biomarkers of disease

Page 15: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

What is a risk gene?- A common genetic variant related to a brain measure, or a

disease, or a trait such as obesity, found by searching the genome

23 pairs of chromosomes

In a particular part of the chromosome 5 there are many genes

Within a gene there are exons, introns, and SNPs

Single Nucleotide Polymorphism (SNP)

99.9% of DNA is the same for all people - DNA variation causes changes in predisposition to disease, and brain structure.One type of variation is a single nucleotide polymorphism (SNP)- Single letter change in the DNA code

Page 16: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

GRIN2B Risk Allele

Glutamate receptor, signaling pathway

Genetic polymorphism of GRIN2B gene

Associated with reductions of brain white matter integrity

Bipolar disorder Obsessive compulsive disorder

Page 17: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Jason L. Stein1, Xue Hua PhD1, Jonathan H. Morra PhD1, Suh Lee1, April J. Ho1, Alex D. Leow MD PhD1,2, Arthur W. Toga PhD1, Jae Hoon Sul3, Hyun Min Kang4, Eleazar Eskin PhD3,5, Andrew J. Saykin PsyD6, Li Shen PhD6, Tatiana Foroud PhD7, Nathan Pankratz7, Matthew J. Huentelman PhD8, David W. Craig PhD8, Jill D. Gerber8, April Allen8, Jason J. Corneveaux8, Dietrich A. Stephan8, Jennifer Webster8, Bryan M. DeChairo PhD9, Steven G. Potkin MD10, Clifford R. Jack Jr MD11, Michael W. Weiner MD12,13, Paul M. Thompson PhD1,*, and the ADNI (2010). Genome-Wide Analysis Reveals Novel Genes Influencing Temporal Lobe Structure with Relevance to Neurodegeneration in Alzheimer's Disease, NeuroImage 2010.

GRIN2b genetic variant is associated with2.8% temporal lobe volume deficit

GRIN2b is over-represented in AD - could be considered an Alzheimer’s disease risk gene- needs replication

Page 18: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Jason L. Stein1, Xue Hua PhD1, Jonathan H. Morra PhD1, Suh Lee1, April J. Ho1, Alex D. Leow MD PhD1,2, Arthur W. Toga PhD1, Jae Hoon Sul3, Hyun Min Kang4, Eleazar Eskin PhD3,5, Andrew J. Saykin PsyD6, Li Shen PhD6, Tatiana Foroud PhD7, Nathan Pankratz7, Matthew J. Huentelman PhD8, David W. Craig PhD8, Jill D. Gerber8, April Allen8, Jason J. Corneveaux8, Dietrich A. Stephan8, Jennifer Webster8, Bryan M. DeChairo PhD9, Steven G. Potkin MD10, Clifford R. Jack Jr MD11, Michael W. Weiner MD12,13, Paul M. Thompson PhD1,*, and the ADNI (2010). Genome-Wide Analysis Reveals Novel Genes Influencing Temporal Lobe Structure with Relevance to Neurodegeneration in Alzheimer's Disease, NeuroImage, 2010.

GRIN2b genetic variant associates with brain volume in these regions; 2.8% more temporal lobe atrophy

Page 19: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Alzheimer’s risk gene carriers (CLU-C) have lower fiber integrity even when young (N=398), 50 years before disease typically hits

Voxels where CLU allele C (at rs11136000) is associated with lower FA after adjusting for age, sex, and kinship in 398 young adults (68 T/T; 220 C/T; 110 C/C). FDR critical p = 0.023. Left hem. on Right Braskie et al., Journal of Neuroscience, May 4 2011

Page 20: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Effect is even stronger for carriers of a schizophrenia risk gene variant, trkA-T (N=391 people)

a. p values indicate where NTRK1 allele T carriers (at rs6336) have lower FA after adjusting for age, sex, and kinship in 391 young adults (31 T+; 360 T-). FDR critical p = 0.038.

b. Voxels that replicate in 2 independent halves of the sample (FDR-corrected). Left is on Right.

Braskie et al., Journal of Neuroscience, May 2012

Page 21: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Neural Fiber Integrity Fractional Anisotropy

Applied to diffusion tensor MRI Eigen = 0 means diffusion is totally

unrestricted Eigen = 1 means diffusion is

restricted to only one direction FA measures fiber density, axonal

diameter, or myelination of white matter

Page 22: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Kohannim O, et al. Predicting white matter integrity from multiple common genetic variants. Neuropsychopharmacology 2012, in press.

COMT

HFE

CLU

NTRK1

ErbB4

BDNF

SNP’s can predict variance in brain integrity

Neuro-chemical genes

Neuro-developmental

genes

Neuro-degenerative

risk genes

A significant fraction of variability in white matter structure of the corpus callosum (measured with

DTI) is predictable from SNPs;

Page 23: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Big Data 26,000 whole brain MR images > 500,000 single nucleotide

polymorphism (SNP) Analyze each voxel of the entire

brain and search for genetic variants of the whole genome at each brain voxel

Select only the most associated SNP at each voxel, by analyzing P-values through an inverse beta transformation

Page 24: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Genetic clustering boosts GWAS power1. Many top hits now reach genome-wide significance (N=472) and

replicate2. Several SNPs affect multiple ROIs

3. Can form a network of SNPs that affect similar ROIs4. It has a small-world, scale-free topology

(for more, see Chiang et al., J. Neurosci., 2012)

Page 25: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Population level Data Integration: Electronic

Medical Records, Genotypes and Phenotypes

Page 26: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

eMERGE

Goal: research to combine DNA biorepositories with EMR for large-scale association studies of genetics and phenotypes; to incorporate genetic variants into EMG for use in clinical care

Page 27: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Network Members

Page 28: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

eMERGE Innovation

Algorithms for electronic phenotyping of clinical conditions identified in EMR

Discoveries of genetic variants in biorepository samples

Page 29: A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.

Big Data to Knowledge