NDS Relevant Update from the NIH Data Science (ADDS) Office
-
Upload
philip-bourne -
Category
Education
-
view
416 -
download
0
Transcript of NDS Relevant Update from the NIH Data Science (ADDS) Office
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
NDS Relevant Update from the NIH Data Science (ADDS) Office
Phil Bourne, Ph.D., FACMIAssociate Director for Data Science (ADDS)
How Can NDS Succeed?• Be at the right place at right time• Bring together all the right stakeholders – there are
groups missing now- eg application scientists, publishers
• Define very well the problem(s) you are trying to solve• Start with pilots, but proceed to a soup to nuts
application that has value and can be sustained
How can NDS Interface with the NIH ….
ADDS Mission StatementTo use data science
to foster an open digital ecosystem
that will accelerate efficient, cost-effective
biomedical research
to enhance health, lengthen life, and reduce illness and disability
A couple of announcements …
http://www.nih.gov/news/health/oct2015/od-20.htm
ADDS Strategy • Discovery and Innovation
Enabling major scientific discovery and innovation through the BD2K Initiative
• Workforce developmentStrengthen the ability of a diverse biomedical workforce to develop and benefit from data science
• Policy and processContribute to policies & processes involving data that further the NIH mission
• LeadershipFurther visibility of NIH leadership in data science by the public, DHHS, USG at large, and international funders
• SustainabilityTo foster a sustainable, efficient, and productive data science ecosystem
Sustainability
Workforce Development
Discovery & Innovation
Policy & Process
Leadership
ADDS Strategy • Discovery and Innovation
Enabling major scientific discovery and innovation through the BD2K Initiative• Workforce development
Strengthen the ability of a diverse biomedical workforce to develop and benefit from data science• Policy and process
Contribute to policies & processes involving data that further the NIH mission• Leadership
Further visibility of NIH leadership in data science by the public, DHHS, USG at large, and international funders
• SustainabilityTo foster a sustainable, efficient, and productive data science ecosystem: The Commons
Sustainability
Workforce Development
Discovery & Innovation
Policy & Process
Leadership
Some Developments…• Centers, standards, training coordination
centers off and running• Looking at funding reference datasets• Hackathons and more…• NLM 2.0
Commons Updateenabling the digital enterprise
What is The Commons?
• Treats products of research – data, methods, papers etc. as digital objects
• These digital objects exist in a shared virtual space
• Digital objects conform to FAIR principles:– Findable– Accessible (and usable)– Interoperable – Reusable
The Commons: Components• Computing environment
– cloud and/or HPC – supports access, utilization, sharing and storage of digital objects.
• Methods for Interoperability– enables connectivity, shareability and interoperability between digital objects.– APIs, Containers (docker etc)
• Digital object compliance model – describes the properties of digital objects that enables them to be discoverable and
shareable– Metadata, UIDs, Clear access controls (human subject data)
• Indexing– Means to find and catalog digital objects
The Commons: Components
Computing Environment: Cloud The ability to store, share and compute on digital research objects
Especially useful for large data sets that are not easily computed locally
Scalable and Elastic
Pay per use - Cost effective
An environment that fosters collaboration
The Commons: Cloud Commercial
AWS, Google, Microsoft, IBM Others
Academic OSC (Open Science Cloud) iDASH (HIPAA compliant)
The Broad Others
The Commons: HPC• Supercomputing Centers in the US
– Supported by DOE and NSF• NERSC(San Francisco)• ORNL (Oak Ridge)• TACC (Texas)• SDSC (San Diego)• Argonne (Urbana- Champaign)
• Optimized, high performance systems with IT support
The Commons: Interoperability
The Commons: Interoperability• Software that supports connectivity and interoperability
between digital (data) objects
– API (Application Programing Interfaces)• Expose and and provide direct access to data• Enable data to be passed to analysis tools or pipelines
– Containers• Package and deploy software tools and pipelines to the cloud
The Commons: Digital Object Compliance
The CommonsDigital Object Compliance: FAIR
• Attributes of digital objects in the Commons • Initial Phase
• Unique digital object identifiers of some type• A minimal set of searchable metadata • Physically available in a cloud based Commons provider• Clear access rules (especially important for human subjects data)• An entry (with metadata) in one or more indices
– Future Phases• Standard, community based unique digital object identifiers • Conform to community approved standard metadata for enhanced searching• Digital objects accessible via open standard APIs• Are physically and logical available to the commons
Commons Pilot Projects
Commons Pilot Projects• Evaluating Commons Framework & Populating the Commons
– NIH funded Large Resource groups BD2K groups (cloud)
– HMP Data and tools available in the cloud (AWS)• https://aws.amazon.com/datasets/1903160021374413
– NCI Cloud Pilots & Genomic Data Commons (AWS, Google)
• The Cloud Credits - business model for using cloud resources
Commons Credits (business model)
The Commons(infrastructure)Cloud Provider
ACloud Provider
BCloud Provider
C
Investigator
NIH
Provides credits Enables Search
Discovery Index
Uses credits inthe Commons IndexesOption:
Direct Funding
• Cost effective - Only pay for IT support used• Drives competition – Better services at lower cost• Supports data access and sharing by driving science into the Commons• Can help determine metrics of data object usage• Facilitates public-private partnership
• Never been tried, so we don’t have data about likelihood of success• Cost Models: Predicated prices among providers• Service Providers: Predicated on service providers willing to make the investment to
become conformant• Persistence: The model is ‘Pay As You Go’ which means if you stop paying it stops going
Cloud Credits: Pros and Cons
NIH… Turning Discovery Into [email protected]
https://datascience.nih.gov/@pebourne