The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer...

Post on 04-Aug-2020

3 views 0 download

Transcript of The NCI Cancer Research Data Commons NCI Cancer... · NaBonal Cancer Data Ecosystem NCI Cancer...

The NCI Cancer Research Data Commons

AllenDearry,Ph.D.ProgramDirector

CenterforBiomedicalInforma9csandInforma9onTechnology

CI4CC10.25.2017

Agenda

1.Background2.Overview–NCICancerResearchDataCommons3.CommonsFramework4.Discussion

Background

4

(10,000+ patient tumors and increasing)

Courtesy of P. Kuhn (USC)

2006-2015:

A Decade of Illuminating the Underlying Causes of Primary Untreated Tumors Omics Characterization

Precision Medicine Ini1a1ve (PMI)

•  Deepbiologicalunderstanding•  Advancesinscien9ficmethods,instrumenta9on,andtechnology

•  Advancesindatamanagementandcomputa9on•  Abilitytoapplythoseadvancestodriveresearchandtreatment

•  Abilitytosecurelysharedataacrossdomains,ins9tu9ons,andstakeholders

CancerresearchandcaregeneratedetaileddatathatarecriBcaltocreatealearninghealthsystemforcancerKeytenetofthePMI:secure,responsibleaccesstohigh-qualitydataThePMIwasannouncedduringtheStateoftheUnionAddress,2015

PrecisionMedicineisagrandchallenge,requiring:

5

Basic Ingredients for PMI Big Data

• Open Science. Supporting Open Access, Open Data, Open Source Software, and Data Liquidity for the cancer community

• Standardization through terminology, CDEs, and CRFs

•  Interoperability by exposing existing knowledge through appropriate integration of ontologies, vocabularies, taxonomies, and data standards

• Sustainable models for informatics infrastructure, services, data, metadata, curation

6

NIH Genomic Data Sharing Policy

hAps://gds.nih.gov/ Went into effect January 25, 2015

NCI guidance:

hAp://www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data

Guiding Principle:

The greatest public benefit will be realized if large-scale genomic data are made available in a 1mely manner to the largest possible number of

inves1gators. For human data, data are made available under terms and condi1ons consistent with the informed consent provided by individual

par1cipants.

7

The Beau Biden Cancer Moonshotsm

Overarchinggoals–Jan,2016•  Accelerateprogressincancer,includingpreven9on&screening

•  FromcuNngedgebasicresearchtowideruptakeofstandardofcare

•  Encouragegreatercoopera9onandcollabora9on

•  Withinandbetweenacademia,government,andprivatesector

•  Enhancedatasharing

BlueRibbonPanel–October,2016•  NetworkforDirectPa9entEngagement•  CancerImmunotherapyTransla9onalScience

Network•  Therapeu9cTargetIden9fica9ontoOvercome

DrugResistance•  ANa9onalCancerDataEcosystemforSharing

andAnalysis•  FusionOncoproteinsinChildhoodCancers•  SymptomManagementResearch•  Preven9onandEarlyDetec9on–Implementa9on

ofEvidence-basedApproaches•  Retrospec9veAnalysisofBiospecimensfrom

Pa9entsTreatedwithStandardofCare•  Genera9onof3DHumanTumorAtlas•  DevelopmentofNewEnablingCancer

Technologies•  Fullreport:www.cancer.gov/brp

Na9onalCancerDataEcosystemRecommenda9on

Overallgoal:“Enableallpar9cipantsacrossthecancerresearchandcarecon9nuumtocontribute,access,combineandanalyzediversedatathatwillenablenewdiscoveriesandleadtoloweringtheburdenofcancer.”• Envisionedtoconsistofmul9plecomponents

•  Fundamentalinfrastructuretoconnectthecomponentsandensureinteroperability

•  CommonAPIs•  Dataschemas•  Commondatadic9onaries•  Enhancedcloudcompu9ngpla]orms

• Componentssuchasrepositories,analy9csservices,andinterac9veportals

•  TheabilitytolinkdiversedatatypesanddatasourcesisfundamentaltointeroperabilityoftheCancerDataEcosystem.

9

Changing the Conversa1on around Data Sharing

• Howdowefinddata,so^ware,standards?• Howcanwemakedata,annota9ons,so^ware,metadataaccessible?• Howdoweadopt/adaptorcreatedatastandards?• Howdowemakemoredatamachinereadable?

NaBonalCancerDataEcosystemNCICancerResearchDataCommons

NIHDataCommonsPilot

DataCommonsco-locatedata,storageandcompuBnginfrastructure,andfrequentlyusedtoolsforanalyzingandsharingdatatocreateaninteroperableresourcefortheresearchcommunity.

10

•  AkeycomponentofalearningNa9onalCancerDataEcosystem•  Makingresearchdataavailablefordiscovery,valida9on,newtherapies•  Maximizingtheimpact,reuse,andreproducibilityofcancerresearch•  Facilita9nginnova9onofmethodsandtoolsforresearch•  Promo9ngresearchcollabora9ons•  Changingincen9vesfordatasharing

Reducetherisk,improveearlydetecBon,outcomes,andsurvivorshipincancer

Why Develop a Cancer Research Data Commons?

NCICancerResearchDataCommons-Vision

12

The NCI Genomic Data Commons

•  Unifyfragmentaryrepositories•  Supportthereceipt,qualitycontrol,integraBon,storage,andredistribuBonofstandardizedgenomicdatasetsderivedfromcancerresearchstudies

•  Harmoniza9onofrawsequencebothfromexis9ngandnewcancerresearchprograms

•  Applica9onofstate-of-the-artmethodsofgenera9ngderivedgenomicdata•  Providethefounda9onfor:

•  Iden9fica9onofhigh-andlow-frequencycancerdrivers•  Defininggenomicdeterminantsofresponsetotherapy•  Clinicaltrialcohortssharingtargetedgene9clesions

13

•  PI:GadGetz,AnthonyPhilippakis•  GoogleCloud•  FirehoseinthecloudincludingBroadbestprac9cesworkflows• hep://firecloud.org

BroadIns9tute

•  PI:IlyaShmulevich•  GoogleCloud•  LeverageGoogleinfrastructure;Novelqueryandvisualiza9on• hep://cgc.systemsbiology.net/

Ins9tuteforSystemsBiology

•  PI:BrandiDavis-Dusenbery•  AmazonWebServices•  Interac9vedataexplora9on;>30publicpipelines• hep://www.cancergenomicscloud.org

SevenBridgesGenomics

Three NCI Genomics Cloud Pilots

ExtensionDesign/BuildI

Design/BuildII Evalua9on Cloud

Resources

Sept2016Jan2016April2015Sept2014 October2017

14

Original Goals of the Pilots Remain Relevant

DemocraBzeaccesstoNCI-generatedgenomicandrelateddata,andtocreateacost-effecBvewaytoprovidescalablecomputaBonalcapacitytothecancerresearchcommunity.

Provide:•  Accesstolargegenomicdatasetswithoutneedtodownload•  Accesstopopularpipelinesandvisualiza9ontools•  Abilityforresearcherstobringtheirowntoolsandpipelinestothedata•  Abilityforresearcherstobringtheirowndataandanalyzeincombina9onwithexis9nggenomicdata•  Workspaces,forresearcherstosaveandsharetheirdataandresultsofanalyses

SBGCGC

BroadFireCloud ISBCGC

Researchers

WebInterface WebInterface

DataSubmission&Harmoniza9on

NCICloudResources:Visualiza9on,Compute,Pipelines,WorkspacesAuthen9ca9on

&Authoriza9onthrueRACommons&dbGaP

GDC

GDC / Cloud Resources: Today

GenomicDataCommons:Harmoniza9on,Visualiza9on,&Download

APIsAPIs

DataCommonsFramework

Whatisit?•  Reusable,expandableframeworkfortheDataCommons

•  DefinesthecoreprinciplesandstructureofaDataCommons

•  ProvidesreusablecomponentsthatcanbeleveragedacrosstheDataCommons

Components•  Secureuserauthen9ca9onandauthoriza9on

•  Metadatavalida9onandtools•  Domain-specific,extensibledatamodels•  APIandcontainerenvironmentfortoolsandpipelines

•  Accesstocomputa9onalworkspacesforstoringdata,tools,andresults

DataCommonsFramework–Why?

•  LeverageworkalreadycompletedbyGDCandCloudPilots/Resources.

• Developinfrastructureandfounda9onfortheDataCommonsandnodesastheyarecreated.

•  Ensureconsistencyandinteroperabilityfromthestart,maximizefuturedatasharing.

• Designmodular,interoperablecomponents—dataaccessservices,indexingandsearch,workspaces,workflowandtoolstores,portalsandUIs--thatcanbeflexibleandassembledintodiversedataenvironments.

• Op9mizeabilitytointegratenewdatatypes.•  InterrelatewithotherCommonsdevelopments—NIH,CZI...

NCI Cancer Research Data Commons (NCRDC)

GenomicDataCommonsNode:GDC

ImagingDataCommonsNode:IDC

ProteomicDataCommonsNode:PDC

APIs

•  Authentication and Authorization •  Metadata Validation Tools •  Data Models

•  User Workspaces •  Container Environment

DataCommonsFramework–Modular,FlexibleCoreServices

Researchers

WebInterface

DataSubmission&Harmoniza9on

GDC

GDC / Cloud Resources: Near Term - Moving Towards a Commons Framework

DockStore Analysisresources

APIsAPIs

SBGCGC

BroadFireCloud

ISBCGC

GDC@GCP

GDC@AWS

GDC@Azure

WebInterface

Centrally-managedcopiesofthedata,mirroredinthe

commercialclouds

Centrally-managedauthenBcaBonandauthorizaBonthrueRACommonsand

dbGaP

CloudResourcesconBnuetoprovidedataaccess,analyBctools,workspace

The NCI Cancer Research Data Commons A virtual, expandable infrastructure

Ø  StandardizeddatasubmissionandQ/CØ  ControlledvocabulariesØ  Harmoniza9onbysubjectmaeerexperts GenomicData

ProteomicData

GDC

Clinical

Functional

Cancer Models

Imaging

Population

Proteomics

NCI Cancer Research Data Commons

GDC

ImagingDataØ  SecuredataaccessthroughAPIorwebUIØ  QueryacrossdatadomainsØ  Analy9cs,elas9ccompute,visualiza9on

GDC

Authentication &

Authorization

Biologists / Clinical Researchers

Clinicians and Patients

Tool / Algorithm Developers

Computational Scientists

DataContributors

API API API API

CancerDataAggregatorAggregatebycase,sample,study,disease,Bssue,etc.

API

APIs

CommunityPresenta9on

Analy9cs

Mul9-modaldataaggrega9on

DataCommonsRepositories/Nodes

Genomics Imaging ProteomicsClinical

GovernanceandOutreach

• Governanceprocesstobeestablished,includingScien9ficandTechnicalReviewBoardandSteeringCommieee

•  Structuredprocessfordecisions,interac9ons,roles

• Outreachandcollabora9on• WorkingwithNIHandotherICsonrelatedini9a9ves/DataCommons,aswellasexternalgroupssuchasChanZuckerberg

•  Par9cipa9ngonNIHandinteragencyworkinggroupsandonPMI-andMoonshot-relatedprojects

•  PlansforworkshopsandRFIstogetcommunityinput,feedback,andpar9cipa9on

CloudResourcesTeamLeads•  GadGetz,Ph.D-BroadIns9tute•  IlyaShmulevich,Ph.D-ISB•  BrandiDavis-Dusenberry,Ph.D-SevenBridges

NCICBIITTeam•  DurgaAddepalli,Ph.D.•  AllenDearry,Ph.D.•  JuliKlemm,Ph.D.•  TanjaDavidsen,Ph.D.•  IzumiHinkson,Ph.D.•  BetsyHsu,Ph.D.•  StephenJee,Ph.D.•  JohnOtridge,Ph.D.•  SimaPandya•  EveShalley•  SteveTsang,Ph.D.

FrameworkTeam•  RobertGrossman,Ph.D-UniversityofChicago•  PhillisTang•  Chris9naYung

Acknowledgements NCICenterforCancerGenomics

•  JCZenklusen,Ph.D.•  DanielaGerhard,Ph.D.•  ZhiningWang,Ph.D.

NCIOfficeofCancerClinicalProteomicsResearch•  HenryRodriguez,Ph.D.•  ChrisKinsinger,Ph.D.

NCICancerImagingProgram

•  PaulaJacobs•  JohnFreymann•  Jus9nKirby

NCILeadership•  DougLowy,M.D.•  WarrenKibbe,Ph.D.•  LouStaudt,M.D.,Ph.D.•  StephenChanock,M.D.

www.cancer.gov www.cancer.gov/espanol