Archiving in the Data Environment of Heliophysics at NASA

21
Science Archives Workshop - April 25, 2007 - Page 1 QuickTime™ and a TIFF (Uncompressed) decomp are needed to see this p Archive Policies and Implementation: A Personal View from a NASA Heliophysics Data Policy Perspective D. Aaron Roberts NASA GSFC 25 April 2007

Transcript of Archiving in the Data Environment of Heliophysics at NASA

Page 1: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 1

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Archive Policies and Implementation:

A Personal View from a NASA Heliophysics Data Policy

PerspectiveD. Aaron Roberts

NASA GSFC

25 April 2007

Page 2: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 2

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Define:Archive(some Google results)

A site containing a large number of files, possibly acquired over time, and often publicly accessible. (100 Best Web Hosting)

A function permitting users to copy one or more files to a long-term storage device. Archive copies can:

Accompany descriptive information; Imply data compression software usage; Be retrieved by archive date, file name, or description

(Tivoli Storage Manager)

Archive is a London-based Trip-hop group. (Wikipedia)

Page 3: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 3

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Science Data Archive Definition

Easily accessible, scientifically useable, well-documented, secure data = a good archive.

Requires: Open data policy Independently useable data Science input (data preparation and serving) Proper registration and backup

Page 4: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 4

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. Archiving Homilies

Archiving is a journey, not a destination “Archive early, archive often” as a natural extension of serving data

“Central” archiving is more about knowledge than acquisition

Knowledge must be easily available: presentation matters

The customer is always right Standards are only as good as the community that

supports them, but they are essential: “It’s the metadata, stupid”

Consider the legacy

Page 5: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 5

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. Archiving is a journey

Properly described, well-documented, accessible data should easily move from one archiving stage to the next:

NASA missions produce Active Archives (nothing is “ingested”) Products, delivery, and initial long-term data plans in Project Data Management Plan Virtual Observatories provide uniform descriptions and access to many such archives

The archive continues to develop in the extended mission A Mission Archive Plan provides updates to the Senior Reviews on status, plans, and

actions for post mission products and service

After the mission, a Resident Archive can continue to server data Active upgrades of data products to be funded by other means NSSDC manages the RAs

“Permanent” archiving may just be moving the data and documentation to a more generic Resident Archive (e.g., SDAC, SPDF) for continued access

At all stages, backups and registries maintain safety and knowledge of the data products

Page 6: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 6

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. “Central” archiving

More about knowledge than acquisition: What exists? Where is it? Is it well documented? Is it safe?

New focus for NSSDC role (at least for HP): knowledge of data environment; management of RAs.

(Harvested) VO registries augmented as needed can provide a complete set of resources.

Information about the above should be available in ways that provide easy overviews as well as details.

Page 7: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 7

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. The customer is always right

The community determines directions: Peer review of VOs, RAs, Data Centers,

Missions: What is working? What could be improved? What can go?

HP Data and Computing Working Group provides feedback on HQ directions

“Top down vision, bottom-up implementation”

“Market-driven” including what we want from archives

Page 8: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 8

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. It’s the metadata, stupid

Standards that work: Value of sharing data SPASE data model provides a uniform

description of data products SPASE description + data = “SIP”, “AIP”, and “DIP”

Preserved data should be in common, open, supported formats (e.g, FITS, HDF, CDF, documented ASCII, …)

Communication and other standards TBD Important to decide the level of description

Page 9: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 9

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. Consider the legacy

Preserving and serving what matters for the long term: What is most useful? (If “all” is not possible) What works now, and what will last (and how)?

Calibrated, best-effort products should accompany level-zero plus software/algorithms

Page 10: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 10

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.A model Heliophysics never quite

implemented

Main problems:

(1) “Planning” is a mission function (in collaboration with VOs and others)

(2) “Ingest” is replaced by “production” and “transfer”

(3) “Access” is a distributed function as are the archives in general

Page 11: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 11

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.The New Heliophysics Mission

Data Lifecycleand Framework

Page 12: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 12

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. Summary

•Easily accessible, scientifically useable, well-documented, secure data = a good archive.• Archiving is a journey, not a destination• “Central” archiving is more about knowledge than

acquisition• Knowledge must be easily available: presentation

matters• The customer is always right• Standards are only as good as the community that

supports them, but they are essential: “It’s the metadata, stupid”

• Consider the legacy

Page 13: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 13

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Backup Slides (HP Data Policy)

Page 14: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 14

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.The HP Data Environment

Data from the Heliophysics Great Observatory reside in a distributed environment and are served from multiple sources.

Multimission Data Centers Solar Data Analysis Center Space Physics Data Facility (CDAWeb, OMNIWeb, etc.) National Space Science Data Center

Mission-level active archives: e.g. ACE, TIMED, TRACE, Cluster, etc.

Much of our data are served from individual instrument sites. We are moving into a new data environment of

Virtual Observatories for convenient search and access of the distributed data, and

Resident Archives to retain the distributed data sources even after mission termination.

We have a Data and Computing Working Group to help us move ahead.

Page 15: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 15

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Goals of the HP Science Data Management Policy

Improve management of and access to HP mission data.

Clarify the architecture and associated data lifecycle milestones of the data environment.

Provide guidelines for proposals, Project Data Management Plans, NRAs, peer reviews, and other activities related to the HP data environment.

Page 16: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 16

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. Basic Philosophy

Evolve the existing HP data environment: take advantage of new computer and Internet technologies to respond to our evolving mission set and community research needs

(enable the HP Great Observatory)

Blend ‘bottoms-up’, ‘market-driven’ implementation approaches with a ‘top-down’ vision for an integrated data environment.

Assure that the HP science community participates in all levels of data management.

Page 17: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 17

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. Guiding Principles

All data produced by the HP missions will be open and made available as soon as is practical.

Gurman's "Right Amount of Glue” from the Fall 2002 AGU meeting sets the philosophy [see http://lwsde.gsfc.nasa.gov], a key component of which is a standard of behavior - share one’s data with everyone.

Data will be independently scientifically usable. adequate documentation including uniform SPASE descriptions sustainable and open data formats easy electronic access provision of appropriate analysis tools.

Page 18: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 18

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. Architecture

The environment will be distributed Many archives with different internal workings

Data integration capabilities provided by discipline-based virtual observatories (“VxO’s”; VSO first for x = “Solar” and now 5 others)

linked by a central dictionary (“SPASE Data model”) and machine-to-machine communication routines.

Easily permits the inclusion of essential data sets from non-NASA sources.

Provides a context for services and advanced analysis tools developed under, e.g. AISRP, LWS TR&T, and the VxOs.

Page 19: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 19

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Policy Recommendations, Etc.

The Policy includes: Roles of data environment components “Rules of the Road” for data use, Recommendations for Project Data Management Plans

and Mission Archive Plans, A timeline of the HP mission data lifecycle

Page 20: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 20

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. Implementation

Use peer-review processes to assist in managing the elements of the environment.

NRAs for: (a) VxOs, (b) Data quality and access improvement, (c) Resident Archives, and (d) Value-added services.

Mission and Data Center Senior Reviews RA reviews.

Success will be determined by community use and feedback. The process is “market-driven.”

Page 21: Archiving in the Data Environment of Heliophysics at NASA

Science Archives Workshop - April 25, 2007 - Page 21

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. Current Activities

Finalizing the Data Policy with community input. Our goal is to have this ready for the MIDEX AO

Implementing a second round of VxOs and processing the next round of proposals for VxOs and related services.

Coordinating these efforts through frequent interactions and work with the SPASE group.

Implementing Resident Archives and the processes to manage these archives.

Working with new missions to incorporate the Data Policy from the start, and “retrofitting” older missions through VxOs and other means.

Working on collaboration with other NASA science divisions, other US agencies, and international partners.

Maintaining a web site for latest news about our data environment:http://hpde.gsfc.nasa.gov.