Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon...

36
DataONE Cyberinfrastructure Ma# Jones Dave Vieglais Bruce Wilson

Transcript of Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon...

Page 1: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

DataONE  Cyberinfrastructure  

Ma#  JonesDave  VieglaisBruce  Wilson

Page 2: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Foremost  a  Federa9on

Member  Nodes  (MNs)• Heart  of  the  federa9on• Harness  the  power  of  local  cura9on

Coordina9ng  Nodes  (CNs)• Services  to  link  Member  Nodes

Inves9gator  Toolkit  (ITK)• Tools  for  the  whole  data  lifecycle

2

Interoperability

Page 3: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

• Scalable

• Usable  by  people  and  agents

• Resilient  to  technical  and  ins9tu9onal  change

• Adap9ve  to  evolving  standards

• Inclusive  of  exis9ng  communi9es  and  tools

• Cognizant  of  sociological  drivers

• Informed  by  prior  and  current  work

Requirements  for  DataONE

Page 4: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Why  a  Federa9on?

Diverse  Federa9on  ==  Resilience• Failover  for  temporary  outages

• Insurance  against  project/ins9tu9onal  failure

Diverse  Federa9on  ==  Scalability• Storage  increases  with  Member  Nodes

• Incremental  costs  to  each  MN  to  replicate

• Distributes  sustainability  costs

4

Page 5: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Authorita9ve  members  of  the  Federa9on• Curate  their  own  data  holdings

Provide  unique  iden,fiers  for  each  object

Ensure  availability,  quality,  and  reliability

• Replicate  holdings  for  other  MNs• Provide  access  and  access  control• Log  and  report  accesses  to  objects• Engage  with  DataONE  community

• Deploy  a  DataONE-­‐compa9ble  soVware  system

Member  Nodes

5

Page 6: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Implementa9on  Tiers

Tier  1   Supports  publicly  readable  content  without  authen9ca9on  or  more  specific  access  control  rules.  

Tier  2   Tier  1  plus  access  control  support

Tier  3   Tier  2  plus  ability  to  add  content  through  the  DataONE  service  interfaces  and  provides  full  support  for  interac9on  with  DataONE  Inves9gator  Toolkit  applica9ons  and  plugins.

Tier  4   Support  the  full  set  of  DataONE  APIs  and  can  operate  as  replica9on  targets,  accep9ng  content  from  compa9ble  (technical  and  policy)  Member  Nodes  and  fully  suppor9ng  the  DataONE  content  access  control  rules.

6

Page 7: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Characterizing  Member  Nodes

Diverse  Contributors• Individual  inves9gators

• Field  sta9ons  and  networks

• Government  agencies

• Non-­‐profit  partnerships

• Scien9fic  Socie9es

• Synthesis  centers 7

<  1

1-­‐10

10-­‐200

>200

0

15

30

45

60

MB

DataSizes

%

Data  Types• Ecological

• Environmental

• Demographic

• Social/Legal/Economic

Page 8: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Characterizing  Member  Nodes

Diverse  Contributors• Individual  inves9gators

• Field  sta9ons  and  networks

• Government  agencies

• Non-­‐profit  partnerships

• Scien9fic  Socie9es

• Synthesis  centers 7

<  1

1-­‐10

10-­‐200

>200

0

15

30

45

60

MB

DataSizes

%

Data  Types• Ecological

• Environmental

• Demographic

• Social/Legal/Economic

Page 9: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Coordina9ng  Nodes

Provide  coordina9ng  services• Search  and  Discovery• Preserva9on  monitoring

• Object  tracking  and  replica  management

• User  iden9ty  management• Logging  and  monitoring

Op9mized• High  availability• Performance• Scalability

Page 10: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

The  Inves9gator  Toolkit

• Discovery  tools

• Data  Management  tools

• Analysis  and  modeling  tools

• Cita9on  and  publica9on  tools

Inves9gator  Toolkit

Web  Interface Analysis,  Visualiza9on Data  Management

Client  LibrariesJava Python Command  Line

Page 11: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Collect

Assure

Describe

Deposit

Preserve

Discover

Integrate

Analyze

Data  Lifecycle

Page 12: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Collect

Assure

Describe

Deposit

Preserve

Discover

Integrate

Analyze

Data  Lifecycle

Morpho

Page 13: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Goal:  Uniquely  iden9fy  data  or  metadata  objects

• Support  the  several  iden9fier  types  widely  used

• Iden9fiers  assigned  by  Member  Nodes

• Uniqueness  ensured  by  Coordina9ng  Nodes

• Resolu9on  through  Coordina9ng  Nodes

Iden9fy  objects

LSID PURLGUID{3F2504E0-4…

Page 14: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Iden9fy  people

• Iden9ty  provider  selected  by  the  user

• Member  nodes  define  access  rules

• Rules  propagated  by  Coordina9ng  Nodes

• Iden9ty  and  access  control  consistent  across  en9re  infrastructure

Page 15: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

KNBGenericNativeProxy

Deposit  Data  and  Metadata

<meta>

Science  metadata•EML,  FGDC,  DC,  ISO,  DIF,  …System  metadata• Globally  unique  IDs  for  data  &  

metadata  (DOI,  GUID,  Hdl,  …)•Checksums  of  objects•Object  policies

Page 16: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Preserve  Data  and  Metadata

• Metadata  mirrored  at  Coordina9ng  Nodes• Data  replicated  between  Member  Nodes• CNs  manage  copies• Checksums  recorded  and  verified• Promote  quality  metadata

Coordina9ngNodes

Page 17: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Discover  Content

Page 18: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Integrate  and  Analyze

16

!

!

!

!!

!

! !

!

! !!

!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

!! !

!

!

!

!

!

!

!!

!

!

!

!

!

! ! ! !

!

!!

!!

!! ! !

24.2

024.3

024.4

024.5

0

water temperature

(bottom, 10m ADCP)

Time

Tem

pera

ture

degre

es C

01:00 05:00 09:00 13:00 17:00

Graphs and derived data can bearchived in DataONE

Page 19: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Analysis  and  Visualiza9on

Spa9o-­‐Temporal  Exploratory  Model  iden9fies  factors  affec9ng  pa#erns  of  migra9on

Diverse  bird  observa9ons  and  environmental  data  from  300,000  loca9ons  in  the  US  integrated  and  analyzed  using  High  Performance  Compu9ng  Resources

Land  Cover

Meteorology

MODIS  –  Remote  sensing  data

Slide  from  S.  Kelling

• Examine  pa#erns  of  migra9on  

• Infer  how  climate  change  may  affect  bird  migra9on

Model  results

Occurrence  of  Swainson’s  Hawk

Jan Sep DecJunApr

Page 20: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

DataONE  System  Overview

Page 21: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

DataONE  System  Overview

Page 22: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Deploy  core  infrastructure  suppor9ng  four  fundamental  services:• Persistent,  unique  iden9fiers• Bit-­‐level  preserva9on• Search  and  retrieval• Federated  iden9ty

Along  with:• Build  out  and  deployment  of  Member  Nodes• Add  ITK  func9onality• Test,  test,  test• Ramp  up  R&D  on  addi9onal  features

DataONE  Ac9vi9es  Through  Year  2

Page 23: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Inves9gator  Toolkit  SoUwareSearchPortal R  Client Morpho

Client  LibrariesJava Python Command  Line

Member  Node  SoUware

Metacat

Coordina9ng  Node  SoUwareService  Interfaces

Object  Store Index

SoVware  Delivered  at  Public  Release

Zotero Fuse  FS Excel

Dryad

GMN CUASHI

MerriZ Preserva9on MonitorCatalogIden9fiers

Replica9on DiscoveryResolu9on Registra9on

Mendeley

DataONE  Service  Programming  Interface  (SPI)

Page 24: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

• Data  sub-­‐selng,  transforma9on

• Visualiza9on• Workflow  support

• Seman9c  search

• Seman9c  data  integra9on

• Computa9onal,  or  specialized  nodes

• Inves9gator  Toolkit  expansion

DataONE  Ac9vi9es:  Years  3-­‐5  

DMP-Tool

Page 25: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Cyberinfrastructure  Outline

• CI  Architecture,  Requirements,  and  Design• Member  Nodes

• Coordina9ng  Nodes• Inves9gator  Toolkit

• Demonstra9ons

Page 26: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

23

Collect

Assure

Describe

Deposit

Preserve

Discover

Integrate

Analyze

Demonstra9ons

Page 27: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

23

Collect

Assure

Describe

Deposit

Preserve

Discover

Integrate

Analyze

Demonstra9ons

Morpho

Page 28: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

23

Collect

Assure

Describe

Deposit

Preserve

Discover

Integrate

Analyze

Demonstra9ons

Morpho

Page 29: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

23

Collect

Assure

Describe

Deposit

Preserve

Discover

Integrate

Analyze

Demonstra9ons

Morpho

Page 30: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

23

Collect

Assure

Describe

Deposit

Preserve

Discover

Integrate

Analyze

Demonstra9ons

Morpho

Page 31: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

23

Collect

Assure

Describe

Deposit

Preserve

Discover

Integrate

Analyze

Demonstra9ons

Morpho

Page 32: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Describing  and  deposit  with  Morpho

24

Page 33: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Data  discovery

25

Page 34: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

File  system  access

26

Page 35: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

R  plugin  demonstra9on

27

Page 36: Ma$Jones Dave$Vieglais BruceWilson - DataONE · Dave$Vieglais BruceWilson. ForemostaFederaon Member’Nodes’(MNs) • Heartof$the$federaon • Harness$the$power$of$local$ …

Value  of  DataONE

• Discovery  and  access:  Enabling  discovery  and  universal  access  to  data  about  life  on  earth    from  around  the  world

• Data  integra9on  and  synthesis:  Providing  transforma9onal  tools  that  enable  cross-­‐culng  research

• Educa9on  and  training:  Providing  essen9al  skills  (e.g.,  data  management  training,  best    prac9ces)  for  scien9fic  enquiry  

• Building  community:  Combining  exper9se  and  resources  across  diverse  communi9es  to  collec9vely  educate,  advocate,  and  support  stewardship  of  scien9fic  data  

• Data  Sharing:  Providing  incen9ves  and  infrastructure  for  sharing  data  from  federally  funded  researchers

28