Alison Yao, Ph.D....Alison Yao, Ph.D. Program Officer, Office of Genomics and Advanced Technologies...
Transcript of Alison Yao, Ph.D....Alison Yao, Ph.D. Program Officer, Office of Genomics and Advanced Technologies...
Alison Yao, Ph.D. Program Officer, Office of Genomics and Advanced TechnologiesDivision of Microbiology and Infectious DiseasesNational Institute of Allergy and Infectious Diseases National Institutes of Health
July 2014
*
*
BIG DATA
NIH Big Data to KnowledgeInitiative for Research Data
BD2K
*
Other ‘Omic
Imaging Phenotypic
Clinical
Genomic
ExposureCourtesy of NHGRI
Primary dataDerived dataInterpreted data/
knowledge
Experimental metadata
Analytical metadata
*BIG DATA
Courtesy of Richard Scheuermann
*BIG DATA*Lots and lots of data in individual labs
Lab 6
Lab 5
Lab 1Lab 2
Lab 4
Lab 3
Courtesy of Michael F. Huerta
Scientific Data Council(SDC)
Big Data to Knowledge
(BD2K)
Associate Directorfor Data Science
(ADDS)
*NIH is Tackling the ‘Big Data’ Problem
Courtesy of NHGRI
*Big Data to Knowledge (BD2K):
Major trans-NIH initiative addressing an NIH imperative and key roadblock
Aims to be catalytic to biomedical research and synergistic across different scientific communities
Overarching goal: BD2K aims to develop the new approaches, standards, methods, tools, software, and competencies that will enhance the use of biomedical Big Data by supporting research, implementation, and training in data science.
• Advance the science & technology of biomedical big data
Data Computing centers and Software development
• Facilitate the broad use of biomedical research data
Data standards, catalog, and data sharing policies
• Enhance & develop the workforce in biomedical big data
Training
*NIH BD2K Initiative
* Impact of NIH BD2K*Increased data sharing will make data available*Promotion of standards will make data useable*Data will be brought into the research ecosystem
*Discoverable, citable & linked to data, tools & literature
*Data science & tools will enable scientific innovation
BD2K will make the biomedical research enterprise more data centric
TomorrowData
centric
TodayHypothesis
driven
Transforming
Biomedical research
*
*The DDICC will support*Data Discoverability
*Data Access
*Data Citation
*Approaches*Community engagement and Outreach
*Task Forces
*Pilot Projects
*Deliverables: *White paper and examples to help inform development of a fully
functional DDI
* NIAID/DMID Genomics Program
Sequencing Functional Genomics
ProteomicsStructural Genomics
SystemsBiology
Bioinformatics
Genomic Sequencing
CentersFunctional Genomic Research Centers
Clinical Proteomics
CentersStructural Genomics Centers
Systems Biology Centers
Bioinformatics Resource Centers
To address key questions in microbiology and infectious disease
Genomic Research ResourcesGenomic/Omics Data Sets, Databases, Bioinformatics Tools, Biomarkers, 3D Structures, Protein Clones, Predictive Models
*Genome Sequencing Centers
Systems Biology Centers
Structure Genomics Centers
Clinical Proteomics Centers
Bioinformatics Resource Centers(BRCs)
*Bioinformatics Resource Centers(BRCs)
Goal: Provide integrated bioinformatics resources in support of basic and applied infectious diseases research
• Data and metadata management and integration solutions• Computational analysis and visualization tools• Work spaces and web interfaces • Training and outreach activities• Free bioinformatics services• Rapid response to new
and emerging pandemic threats
*Bioinformatics Resource Centers(BRCs)
*
Data Management & Integration
Web interfaces and workspaces
Computational analysis tools
Software Engineering
Social Engineering
Collaboration Bioinformatics Services
Training Workshop
*Data Tools
*
BRCs
DBPs
ICEMR
CEIRS
**Key Features:
*~16,000 bacterial genomes and standardized annotations
*Free bioinformatics services
* Genome annotation service (RAST)
* Comparative genome analysis
* Integrated genomic and omics data, metadata and tools
*Comparative analyses and interactive visualizations
*Personal workspace
*TB Portal
*Genomes Metadata
Phylogenetic Trees Genes & Proteins
*Protein-protein interactions Structures
Pathways
Proteomics, ChIP-Seq data coming January 2014
Transcriptomics (Microarray, RNA-Seq)
*Reference genomes (H37Rv)
Gene/ Protein search
Analysis Tools
tb.patricbrc.org
OmicsData
*
Data Generation• Infectious disease
community• CEIRS
Data Processing• Bioinformatics
centers, IRD• CEIRS data
coordinating center
Knowledge Presentation• Open access• Visualization• Analysis
Training Services Collaboration
QueryAnalysis
InsightHypothesis
DMID/OGAT
Maria GiovanniValentina Di FrancescoJulia PuzakEun Mi LeePunam MathurMalu PolanskiVivien DuganChristina Giblin
The Influenza Research Database Team
J. Craig Venter Institute Northrop Grumman Health SolutionsVecna Technologies Los Alamos National LaboratoryUniversity of California Davis
*Acknowledgment