Perspectives from the NIH Associate Director for Data Science (ADDS) Office
-
Upload
amazon-web-services -
Category
Technology
-
view
855 -
download
1
Transcript of Perspectives from the NIH Associate Director for Data Science (ADDS) Office
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Perspectives from the NIH Associate Director for Data Science (ADDS) Office
Vivien Bonazzi, Ph.D.Senior Advisor for Data Science Technologies & Innovation
NIH Office of the Associate Director for Data Science (ADDS)
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
NIH Addresses Big Data• In response to the
incredible growth of large biomedical (digital) datasets, the Director of NIH established a special Data and Informatics Working Group (DIWG).
VolumeVelocityVarietyVeracity
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
US Government Memo Increasing Access to the Results of Federally Funded Scientific Research
In Feb 2013 the US OSTP issued a memo calling for all Federal
Agencies to make digital assets from federally funded research available.Each agency’s public access plan shall:
Maximize access, by the general public and without charge, to digitally formatted scientific
data created with Federal funds while:i) protecting confidentiality and personal privacy, ii) recognizing proprietary interests, business confidential information, and intellectual property rights and avoiding significant negative impact on intellectual property
rights, innovation, and U.S. competitiveness, and iii) preserving the balance between the relative value of long-term preservation and access and the associated cost and administrative burden.
Provide for the assessment of long-term needs for the preservation of scientific data and outline options for developing and sustaining repositories for scientific data in digital formats.
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Federal Science Policy Changes• NIH and other Federal Agencies are working to make digital assets from
federally funded research available.
• Public Access to Data Memo: http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf
• Applies to publications and digital scientific data
• Develop a strategy for: – leveraging existing archives (where appropriate) – fostering public-private partnerships with scientific journals relevant to the
agency’s research
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
NIH Response
Establish new data science research and training programs: Big Data to Knowledge (BD2K) - 2013http://datascience.nih.gov/bd2k
Establish a new position: NIH Associate Director of Data Science(ADDS)
Dr. Phil Bourne - 2014
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Future of Open Data• The nature of the scientific enterprise is evolving.
• Must transform into a digital enterprise
(as have other industries: music, financial, advertising)
• To enable biomedical research as a digital enterprise through which new discoveries are made and knowledge generated by maximizing community engagement and productivity.
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
ADDS Mission StatementTo use data science
to foster an
open digital ecosystem
that will accelerate
efficient, cost-effective
biomedical research
to enhance health, lengthen
life, and reduce illness and
disability
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
ADDS Strategy • Discovery and Innovation
Enabling major scientific discovery and innovation through the BD2K Initiative
• Workforce developmentStrengthen the ability of a diverse biomedical workforce to develop and benefit from data science
• Policy and processContribute to policies & processes involving data that further the NIH mission
• LeadershipFurther visibility of NIH leadership in data science by the public, DHHS, USG at large, and international
funders
• SustainabilityTo foster a sustainable, efficient, and productive data science ecosystem
Sustainability
Workforce Development
Discovery & Innovation
Policy & Process
Leadership
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
ADDS Strategy • Discovery and Innovation
Enabling major scientific discovery and innovation through the BD2K Initiative
• Workforce development
Strengthen the ability of a diverse biomedical workforce to develop and benefit from data science
• Policy and process
Contribute to policies & processes involving data that further the NIH mission
• Leadership
Further visibility of NIH leadership in data science by the public, DHHS, USG at large, and international funders
• Sustainability
To foster a sustainable, efficient, and productive data science
ecosystem: The Commons
Sustainability
Workforce Development
Discovery & Innovation
Policy & Process
Leadership
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Commonsenabling the digital enterprise
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
What is The Commons?
• Treats products of research – data, methods, papers etc. as digital objects
• These digital objects exist in a shared virtual space
• Digital objects conform to FAIR principles:– Findable– Accessible (and usable)
– Interoperable – Reusable
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
• A shared virtual space where scientists can:– Find– Deposit– Manage– Share and – Reuse data, software, metadata and workflows
• An environment to find and catalyze the use of shared digital research objects
What is The Commons?
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Commons: Components• Computing environment
– cloud and/or HPC
– supports access, utilization, sharing and storage of digital objects.
• Methods for Interoperability– enables connectivity, shareability and interoperability between digital objects.
– APIs, Containers (docker etc)
• Digital object compliance model – describes the properties of digital objects that enables them to be discoverable and
shareable
– Metadata, UIDs, Clear access controls (human subject data)
• Indexing– Means to find and catalog digital objects
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Commons: Components
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Computing Environment: Cloud The ability to store, share and compute on digital research
objects
Especially useful for large data sets that are not easily computed locally
Scalable and Elastic
Pay per use - Cost effective
An environment that fosters collaboration
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Commons: Cloud Commercial
AWS, Google, Microsoft, IBM Others
Academic OSC (Open Science Cloud) iDASH (HIPAA compliant)
The Broad Others
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Commons: HPC• Supercomputing Centers in the US
– Supported by DOE and NSF• NERSC(San Francisco)
• ORNL (Oak Ridge)
• TACC (Texas)
• SDSC (San Diego)
• Argonne (Urbana- Champaign)
• Optimized, high performance systems with IT support
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Commons: Interoperability
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Commons: Interoperability• Software that supports connectivity and interoperability
between digital (data) objects
– API (Application Programing Interfaces)• Expose and and provide direct access to data
• Enable data to be passed to analysis tools or pipelines
– Containers• Package and deploy software tools and pipelines to the cloud
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Commons: Digital Object Compliance
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The CommonsDigital Object Compliance: FAIR
• Attributes of digital objects in the Commons • Initial Phase
• Unique digital object identifiers of some type
• A minimal set of searchable metadata
• Physically available in a cloud based Commons provider
• Clear access rules (especially important for human subjects data)
• An entry (with metadata) in one or more indices
– Future Phases• Standard, community based unique digital object identifiers
• Conform to community approved standard metadata for enhanced searching
• Digital objects accessible via open standard APIs
• Are physically and logical available to the commons
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Commons: PI Perspective
The Commons(infrastructure)Cloud Provider
ACloud Provider
BCloud Provider
C
Investigator
Enables Search
Discovery Index
Indexes
PI
1. Efficiency
Digital object ComplianceInteroperability SW
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Commons Pilot Projects
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Commons Pilot Projects• Evaluating Commons Framework & Populating the
Commons
– NIH funded Large Resource groups BD2K groups (cloud)
– HMP Data and tools available in the cloud (AWS)
• https://aws.amazon.com/datasets/1903160021374413
– NCI Cloud Pilots & Genomic Data Commons (AWS, Google)
• The Cloud Credits - business model for using cloud resources
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Commons Credits (business model)
The Commons(infrastructure)Cloud Provider
ACloud Provider
BCloud Provider
C
Investigator
NIH
Provides credits Enables Search
Discovery Index
Uses credits inthe Commons IndexesOption:
Direct Funding
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
• Cost effective - Only pay for IT support used
• Drives competition – Better services at lower cost
• Supports data access and sharing by driving science into the Commons
• Can help determine metrics of data object usage
• Facilitates public-private partnership
• Never been tried, so we don’t have data about likelihood of success
• Cost Models: Predicated prices among providers
• Service Providers: Predicated on service providers willing to make the investment to become conformant
• Persistence: The model is ‘Pay As You Go’ which means if you stop paying it stops going
Cloud Credits: Pros and Cons
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Thank You.This presentation will be loaded to SlideShare the week following the Symposium.
http://www.slideshare.net/AmazonWebServices
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Vivien Bonazzi: [email protected] Komatsoulis: [email protected]
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Secure Genomics Analysis on Amazon Web Services
Angel PizarroScientific Computing, Amazon Web Services
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Shared responsibility model
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
FacilitiesPhysical securityCompute infrastructureStorage infrastructureNetwork infrastructureVirtualization layer (Amazon EC2)Hardened service endpointsRich AWS Identity & Access Management (IAM) capabilities
ApplicationsAuth & acct managementAuthorization policiesProper service configurationNetwork configurationSecurity groupsOS firewallsOperating systems
+ =
• Re-focus your security professionals on a subset of the problem
• Partners can further reduce that burden
• Take advantage of high levels of uniformity and automation
The shared responsibility modelAuditedCustomer + Partner
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Genomics Data Security
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Store and analyze restricted-access genomics on AWS
bit.ly/aws-dbgap
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
NIH security best practices• Physical security
– Data center access and remote administrator access
• Electronic security– User account security (for example, passwords)– Use of access control lists (ACLs)– Secure networking– Encryption of data in transit and at rest– OS and software patching
• Data access security– Authorization of access to data– Tracking copies; cleaning up after use
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
EnterpriseApplications
Virtual Desktops
Collaboration and Sharing
PlatformServices
Databases
Caching
Relational
NoSQL
Analytics
Hadoop
Real-time
Data Workflows
Data Warehouse
App Services
Queuing
Orchestration
App Streaming
Transcoding
Search
Deployment & Management
Containers
DevOps Tools
Resource Templates
Usage Tracking
Monitoring and Logs
Mobile Services
Identity
Sync
Mobile Analytics
Notifications
FoundationServices
Compute(VMs, Auto Scaling and Load Balancing)
Storage(Object, Block, and Archive)
Security & Access Control
Networking
Infrastructure Regions CDN and Points of PresenceAvailability Zones
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Amazon Virtual Private Cloud (Amazon VPC)
Create secure network configurations for working with sensitive data
EC2
10.0.2.12
AWS region – VPC network isolation
AZ A AZ B
VPC 10.0.0.0/16
SN 10.0.1.0/24 (DMZ) SN 10.0.2.0/24 (Private)
(23.20.103.11)
Internet
EC2
10.0.1.11
Internet GW Service
Virtual Gateway
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
EnterpriseApplications
Virtual Desktops
Collaboration and Sharing
PlatformServices
Databases
Caching
Relational
NoSQL
Analytics
Hadoop
Real-time
Data Workflows
Data Warehouse
App Services
Queuing
Orchestration
App Streaming
Transcoding
Search
Deployment & Management
Containers
DevOps Tools
Resource Templates
Usage Tracking
Monitoring and Logs
Mobile Services
Identity
Sync
Mobile Analytics
Notifications
FoundationServices
Compute(VMs, Auto Scaling and Load Balancing)
Storage(Object, Block, and Archive)
Security & Access Control
Networking
Infrastructure Regions CDN and Points of PresenceAvailability Zones
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Encrypt your data prior to sending to AWS
Your applications in your data center
Your applications in Amazon EC2Encrypted
data
AWS Services
Amazon S3
Amazon Glacier
Amazon Redshift
Amazon Elastic Block Store
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Encryption: a brief primer
PlaintextPHI
Hardware/Software
EncryptedPHI
SymmetricData Key
Encrypted Data Key
Master KeySymmetricData Key
?
EncryptedData in Storage
Key Hierarchy
?
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Encryption of AWS storage services
Amazon EBS
Amazon S3
• HTTPS• AES-256 server-side encryption• AWS or customer-provided or customer-managed keys• Each object gets its own key
• End-to-end secure network traffic• Whole volume encryption• AWS or customer-managed keys• Encrypted incremental snapshots• Minimal performance overhead (uses Intel AES-NI)
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
S3 server encryption with AWS fully-managed keys
PlaintextPHI
EncryptedPHI
SymmetricData KeyS3 Web Server
HTTPS
CustomerPHI
Encrypted Data Key
Master KeySymmetricData Key
S3 StorageFleet
A master key managed by S3 and protected by systems internal to AWS in a distinct system
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS Key Management Service
A service that enables you to provision and use encryption keys to protect your data
Allows you to create, use, and manage encryption keys from within…Your own applications via the AWS SDK
Supported AWS services (Amazon S3, Amazon EBS, Amazon Redshift)
Available in all commercial regions
Can be used in a key hierarchy to secure data encryption keys protecting PHI
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS services integrate with AWS KMS• 2-tiered key hierarchy using envelope encryption
• Data keys encrypt customer data
• AWS KMS customer master keys encrypt data keys
• Benefits:• Limits blast radius of compromised resources and
their keys• Better performance• Easier to manage a small number of master keys
than billions of resource keys
Master Key(s)
Data Key 1
S3 Object EBS Volume Amazon RDS Instance
Amazon Redshift Cluster
Data Key 2 Data Key 3 Data Key 4 Data Key 5
Your Application
Keys encrypted
Data encrypted
KMS
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Thank You.This presentation will be loaded to SlideShare the week following the Symposium.
http://www.slideshare.net/AmazonWebServices
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015