Archive Ingest Redesign March 14, 2003

16
Implementation Review 1 Archive Ingest Redesign March 14, 2003

description

Archive Ingest Redesign March 14, 2003. Archive Ingest Redesign high-level requirements. Port Ingest system from Open VMS to Unix Ingest will be the last remaining back-end function on Open VMS. Ingest will run under Solaris on the 15k - PowerPoint PPT Presentation

Transcript of Archive Ingest Redesign March 14, 2003

Page 1: Archive Ingest Redesign March 14, 2003

Implementation Review 1

Archive Ingest RedesignMarch 14, 2003

Page 2: Archive Ingest Redesign March 14, 2003

Implementation Review 2

Archive Ingest Redesign high-level requirements

Port Ingest system from Open VMS to Unix Ingest will be the last remaining back-end function on Open VMS. Ingest will run under Solaris on the 15k

Make Ingest scalable for future increase in data volume post-SM4

Improve throughput and reliability Decouple Ingest from Distribution software for ease of operation and

maintenance Improve system maintainability

Facilitate Ingest changes that are driven by changes in data structure during science instrument lifetimes.

Page 3: Archive Ingest Redesign March 14, 2003

Implementation Review 3

Current OPUS data processing / DADS Ingest interface Historically, data processing and archive systems have developed

independently. Data processing system went from PODPS to OPUS. Archive system went from DMF to ST-DADS. In the past, these systems have not even operated within the same security

environment. This paradigm does not work with the current archive philosophy.

On-the Fly Reprocessing (OTFR) requires integration of data processing and archive distribution functionality.

Enhanced data processing, particularly database catalogs, requires closer coupling of data processing and archive system.

To address this change, software maintenance for data processing and archive systems now in one branch.

Page 4: Archive Ingest Redesign March 14, 2003

Implementation Review 4

Current OPUS data process – DADS Ingest interface (cont.)

PACOR-A

pod file

ArchiveCatalog

ScienceData

Processing

NSA

metadata

HST Science DataReceipt

pod file

uncalibrated FITSdataset

MO diskcalibrated FITSdataset

Calibration

pod file &uncalibratedFITS dataset

Ingest

OPUS

OPUS

DADS

Page 5: Archive Ingest Redesign March 14, 2003

Implementation Review 5

Ingest Functionality

Extract metadata from data header keyword values and populate archive science catalog

Write data files to archive storage media Catalog location and properties of files in archive

database Validate integrity of data files Set proprietary status of data files

Page 6: Archive Ingest Redesign March 14, 2003

Implementation Review 6

Goals of Ingest Redesign project Make Ingest more compatible with current science

instrument design It is almost impossible to enhance the fragile Open VMS DADS

system for new science instruments without breaking existing functionality.

Bring Ingest requirements up to date No longer support GEIS format in archive

Create final archive for HST first generation science instruments No ingest of raw engineering data or subset engineering data

CCS is now HST engineering data archive

Improve operator control of the system

Page 7: Archive Ingest Redesign March 14, 2003

Implementation Review 7

Status of Ingest Redesign project

Ingest Ops Concept complete and distributed on February 20, 2003

Requirement definition in progress

Page 8: Archive Ingest Redesign March 14, 2003

Implementation Review 8

Highlights of Ingest Ops Concept Represents a significant simplification in the data system

architecture Deploy Ingest as a natural extension of data processing

pipelines. Build Ingest on OPUS architecture

OPUS software system has over 7 years of operational experience on HST

Risk mitigated by using a proven architecture Time to deployment will be reduced

Consistent with JWST concept for data processing and archive systems

Same software will be used for both HST and JWST

Page 9: Archive Ingest Redesign March 14, 2003

Implementation Review 9

Highlights of Ingest Ops Concept (cont.)

archive sciencecatalog population

PACOR-A

pod file

ArchiveScienceCatalog

Core SDP NSA

metadata

HST ScienceData Receipt

pod file

uncalibrated FITS dataset

Data depoton EMC

MO disk

Ingest pipeline

calibrated FITS dataset

Calibration

Ingest pipelinepodfile

pod file &uncalibratedFITS dataset

OPUS

OPUS

OPUS

OPUS

DADS

House-keepingCatalog

Page 10: Archive Ingest Redesign March 14, 2003

Implementation Review 10

Highlights of Ingest Ops Concept (cont.) Reduces amount of data shuffling and conversions

between different software systems E.g., current WFPC2 science data processing pipeline

Solaris

Open VMS

Tru 64 Unix

OPUSGeneric

ConverstionCALWP2

FITS2GEIS

StandardFITS

Pass filesfrom OPUS

to DADS

DADSIngest

stwfits

MO disk

StandardFITS VMS

GEIS

VMS GEIS (files not readable on Tru 64)

VMSGEIS

wFITS

Page 11: Archive Ingest Redesign March 14, 2003

Implementation Review 11

Highlights of Ingest Ops Concept (cont.)

Reduces amount of data shuffling and conversions between different software systems (cont.) Future WFPC2 science data processing pipeline

Solaris

OPUSGeneric

ConverstionCALWP2Standard

FITS Ingest Data depoton EMC

StandardFITS

StandardFITS

Page 12: Archive Ingest Redesign March 14, 2003

Implementation Review 12

Benefits of Ops Concept All operations on data handled in a single data

flow. Create FITS file, populate header keyword values,

extract metadata from keyword values, populate science component of archive catalog

No duplication of development effort or functionality Consistent development, testing, and operations helps

insure quality of archive catalog Facilitates easier delivery of header changes

Keyword changes can be built, tested, and deployed within a single subsystem

Page 13: Archive Ingest Redesign March 14, 2003

Implementation Review 13

Benefits of Ops Concept (cont.)

Decouples Ingest and Distribution Software Although both will utilize much of the same hardware

such as the Data depot, 15k, and database

Provides opportunity for consolidation of OPUS and DADS based operator tools

Provides opportunity to automate data validation

Page 14: Archive Ingest Redesign March 14, 2003

Implementation Review 14

Ingest Redesign Schedule Ingest Operational Concept complete and distributed on

February 20, 2003. Requirement specification in progress

To be completed by April 15, 2003

The remainder of the schedule is very preliminary pending requirement scoping and build planning

Design review: June 2003 Phased development in OPUS builds between June 2003 and

March 2004 System tests: March – April 2004 Deploy system: May 2004

Page 15: Archive Ingest Redesign March 14, 2003

Implementation Review 15

Summary of Data Systems software ports to Solaris Over the last few years, HST data processing

systems have been ported from Open VMS to Solaris: OPUS infrastructure

Ported to Unix for FUSE – February 1998 Current version tested under Solaris

HST Science Instrument pipeline applications Ported to Tru64 Unix – October 1999 Testing on Solaris in progress, minor changes anticipated

HST Engineering Data Processing pipelines Ported to Solaris – February 2003

Page 16: Archive Ingest Redesign March 14, 2003

Implementation Review 16

Summary of Data Systems software ports to Solaris (cont.) HST archive systems port from Open VMS to Solaris in

progress: Data Distribution system

completion expected in summer 2003 Archive Ingest system

completion expected in spring 2004

With completion of Archive Ingest System redesign project, all data systems will be running under Solaris.

No other major system enhancement projects expected through end of HST mission.