Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon...
Transcript of Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon...
DataONE Cyberinfrastructure
Ma# JonesDave VieglaisBruce Wilson
Foremost a Federa9on
Member Nodes (MNs)• Heart of the federa9on• Harness the power of local cura9on
Coordina9ng Nodes (CNs)• Services to link Member Nodes
Inves9gator Toolkit (ITK)• Tools for the whole data lifecycle
2
Interoperability
• Scalable
• Usable by people and agents
• Resilient to technical and ins9tu9onal change
• Adap9ve to evolving standards
• Inclusive of exis9ng communi9es and tools
• Cognizant of sociological drivers
• Informed by prior and current work
Requirements for DataONE
Why a Federa9on?
Diverse Federa9on == Resilience• Failover for temporary outages
• Insurance against project/ins9tu9onal failure
Diverse Federa9on == Scalability• Storage increases with Member Nodes
• Incremental costs to each MN to replicate
• Distributes sustainability costs
4
Authorita9ve members of the Federa9on• Curate their own data holdings
Provide unique iden,fiers for each object
Ensure availability, quality, and reliability
• Replicate holdings for other MNs• Provide access and access control• Log and report accesses to objects• Engage with DataONE community
• Deploy a DataONE-‐compa9ble soVware system
Member Nodes
5
Implementa9on Tiers
Tier 1 Supports publicly readable content without authen9ca9on or more specific access control rules.
Tier 2 Tier 1 plus access control support
Tier 3 Tier 2 plus ability to add content through the DataONE service interfaces and provides full support for interac9on with DataONE Inves9gator Toolkit applica9ons and plugins.
Tier 4 Support the full set of DataONE APIs and can operate as replica9on targets, accep9ng content from compa9ble (technical and policy) Member Nodes and fully suppor9ng the DataONE content access control rules.
6
Characterizing Member Nodes
Diverse Contributors• Individual inves9gators
• Field sta9ons and networks
• Government agencies
• Non-‐profit partnerships
• Scien9fic Socie9es
• Synthesis centers 7
< 1
1-‐10
10-‐200
>200
0
15
30
45
60
MB
DataSizes
%
Data Types• Ecological
• Environmental
• Demographic
• Social/Legal/Economic
Characterizing Member Nodes
Diverse Contributors• Individual inves9gators
• Field sta9ons and networks
• Government agencies
• Non-‐profit partnerships
• Scien9fic Socie9es
• Synthesis centers 7
< 1
1-‐10
10-‐200
>200
0
15
30
45
60
MB
DataSizes
%
Data Types• Ecological
• Environmental
• Demographic
• Social/Legal/Economic
Coordina9ng Nodes
Provide coordina9ng services• Search and Discovery• Preserva9on monitoring
• Object tracking and replica management
• User iden9ty management• Logging and monitoring
Op9mized• High availability• Performance• Scalability
The Inves9gator Toolkit
• Discovery tools
• Data Management tools
• Analysis and modeling tools
• Cita9on and publica9on tools
Inves9gator Toolkit
Web Interface Analysis, Visualiza9on Data Management
Client LibrariesJava Python Command Line
Collect
Assure
Describe
Deposit
Preserve
Discover
Integrate
Analyze
Data Lifecycle
Collect
Assure
Describe
Deposit
Preserve
Discover
Integrate
Analyze
Data Lifecycle
Morpho
Goal: Uniquely iden9fy data or metadata objects
• Support the several iden9fier types widely used
• Iden9fiers assigned by Member Nodes
• Uniqueness ensured by Coordina9ng Nodes
• Resolu9on through Coordina9ng Nodes
Iden9fy objects
LSID PURLGUID{3F2504E0-4…
Iden9fy people
• Iden9ty provider selected by the user
• Member nodes define access rules
• Rules propagated by Coordina9ng Nodes
• Iden9ty and access control consistent across en9re infrastructure
KNBGenericNativeProxy
Deposit Data and Metadata
<meta>
Science metadata•EML, FGDC, DC, ISO, DIF, …System metadata• Globally unique IDs for data &
metadata (DOI, GUID, Hdl, …)•Checksums of objects•Object policies
Preserve Data and Metadata
• Metadata mirrored at Coordina9ng Nodes• Data replicated between Member Nodes• CNs manage copies• Checksums recorded and verified• Promote quality metadata
Coordina9ngNodes
Discover Content
Integrate and Analyze
16
!
!
!
!!
!
! !
!
! !!
!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
!! !
!
!
!
!
!
!
!!
!
!
!
!
!
! ! ! !
!
!!
!!
!! ! !
24.2
024.3
024.4
024.5
0
water temperature
(bottom, 10m ADCP)
Time
Tem
pera
ture
degre
es C
01:00 05:00 09:00 13:00 17:00
Graphs and derived data can bearchived in DataONE
Analysis and Visualiza9on
Spa9o-‐Temporal Exploratory Model iden9fies factors affec9ng pa#erns of migra9on
Diverse bird observa9ons and environmental data from 300,000 loca9ons in the US integrated and analyzed using High Performance Compu9ng Resources
Land Cover
Meteorology
MODIS – Remote sensing data
Slide from S. Kelling
• Examine pa#erns of migra9on
• Infer how climate change may affect bird migra9on
Model results
Occurrence of Swainson’s Hawk
Jan Sep DecJunApr
DataONE System Overview
DataONE System Overview
Deploy core infrastructure suppor9ng four fundamental services:• Persistent, unique iden9fiers• Bit-‐level preserva9on• Search and retrieval• Federated iden9ty
Along with:• Build out and deployment of Member Nodes• Add ITK func9onality• Test, test, test• Ramp up R&D on addi9onal features
DataONE Ac9vi9es Through Year 2
Inves9gator Toolkit SoUwareSearchPortal R Client Morpho
Client LibrariesJava Python Command Line
Member Node SoUware
Metacat
Coordina9ng Node SoUwareService Interfaces
Object Store Index
SoVware Delivered at Public Release
Zotero Fuse FS Excel
Dryad
GMN CUASHI
MerriZ Preserva9on MonitorCatalogIden9fiers
Replica9on DiscoveryResolu9on Registra9on
Mendeley
…
DataONE Service Programming Interface (SPI)
• Data sub-‐selng, transforma9on
• Visualiza9on• Workflow support
• Seman9c search
• Seman9c data integra9on
• Computa9onal, or specialized nodes
• Inves9gator Toolkit expansion
DataONE Ac9vi9es: Years 3-‐5
DMP-Tool
Cyberinfrastructure Outline
• CI Architecture, Requirements, and Design• Member Nodes
• Coordina9ng Nodes• Inves9gator Toolkit
• Demonstra9ons
23
Collect
Assure
Describe
Deposit
Preserve
Discover
Integrate
Analyze
Demonstra9ons
23
Collect
Assure
Describe
Deposit
Preserve
Discover
Integrate
Analyze
Demonstra9ons
Morpho
23
Collect
Assure
Describe
Deposit
Preserve
Discover
Integrate
Analyze
Demonstra9ons
Morpho
23
Collect
Assure
Describe
Deposit
Preserve
Discover
Integrate
Analyze
Demonstra9ons
Morpho
23
Collect
Assure
Describe
Deposit
Preserve
Discover
Integrate
Analyze
Demonstra9ons
Morpho
23
Collect
Assure
Describe
Deposit
Preserve
Discover
Integrate
Analyze
Demonstra9ons
Morpho
Describing and deposit with Morpho
24
Data discovery
25
File system access
26
R plugin demonstra9on
27
Value of DataONE
• Discovery and access: Enabling discovery and universal access to data about life on earth from around the world
• Data integra9on and synthesis: Providing transforma9onal tools that enable cross-‐culng research
• Educa9on and training: Providing essen9al skills (e.g., data management training, best prac9ces) for scien9fic enquiry
• Building community: Combining exper9se and resources across diverse communi9es to collec9vely educate, advocate, and support stewardship of scien9fic data
• Data Sharing: Providing incen9ves and infrastructure for sharing data from federally funded researchers
28