Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology...

24
Introduction to Apache OODT Yang Li Mar 9, 2012

Transcript of Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology...

Page 1: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Introduction to Apache OODT

Yang Li

Mar 9, 2012

Page 2: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

What is OODT

• Object Oriented Data Technology

• Science data management

• Archiving Systems that span scientific disciplines

• Enable interoperability among data agnostic systems (astrophysics, planetary, space science data systems, open source web analytics)

Page 3: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

History

• 2001– deployed to make virtual specimen bank for Early

Detection Research Network (oncology)• 2004

– Core architectural software of Planetary Data System Data Distribution deployed by NASA (planetary science)

• 2007– deployed for the Orbiting Carbon Observatory and

Seawinds missions (earth science)• 2008

– deployed in for National Polar-Orbiting Environmental Satellite System (atmospheric science)

Page 4: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Framework

• Catalog & Archive

• Utilities

• Grid

• Agility

Page 5: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Catalog & Archive

• Deal with large-scale ingest of data, metadata extraction of data, post-processing of data into derived and higher-order products, cataloging of data, searching of catalogs, versioning, and retrieval

• Components:– Catalog, Crawling framework, Curation, File

manager, Metadata, PCS, Push/Pull framework, Resource management, Workflow, CAS install, Web apps

Page 6: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Catalog

• Virtualize underlying catalogs for use in the CAS system

• Heterogeneous catalog models are mapped to a common dictionary, and then integrated locally so that they may be queried across and ingested into

Page 7: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

CAS Crawler

• Standardize the common ingestion activities– identification of files and directories to

crawl– satisfaction of ingestion pre-conditions– metadata extraction

• Ingestion

Page 8: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

CAS Crawler

Page 9: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Curation

• A web application for managing policy for products and files and metadata that have been ingested via the CAS component– Use a servlet container to deploy the web app– Staging area

• Directories on local machine holding data products

– Metadata generation area• Create metadata files to associate with data

products

Page 10: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

File Manager

• Provide everything to catalog, archive and manage files, and directories, and their associated metadata

• Separate data stores and metadata stores as standard interfaces

Page 11: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Workflow

• Provides everything to execute workflows, and science processing pipelines.

• Separate workflow repositories and workflow engines as standard interfaces

Page 12: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Resource Management

• Job management– Execution, monitoring, traking

• Underlying software system and hardware resources– e.g. disk space, computational resources,

and shared identity

Page 13: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Resource Management (Cont)

• Critical objects– Job, Job Input, Job Spec, Job Instance,

Resource Node

Page 14: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Metadata

• A Multi-valued, generic Metadata container class

• Internal map of string keys pointing to vectors of strings – [std:string key] std:vector of std:strings⇒

Page 15: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Framework

• Catalog & Archive

• Common Utilities

• Grid

• Agility

Page 16: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Common Utilities

• Provide needed support for catalogs, archives, and grids

• Query Expression – Platform neutral and extensible way of

posing questions

• Single Sign On

• Commons– Lots of miscellaneous utilities, including I/O

streams, logging, XML, and more

Page 17: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Query Expression

• Provide a way to express queries in a generic manner

• Use boolean postfix expressions to capture the domain, range, and constraint of a query, regardless of the source of the query

• Encapsulate the results of a query– standard way to pass a query and its

results between servers, clients, nodes, and other components.

Page 18: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Framework

• Catalog & Archive

• Utilities

• Grid

• Agility

Page 19: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Grid

• Profile (metadata) and Product (data) services• Product

– Retrieves resources (products) in platform-neutral formats

• Profile– Describes and discovers resources using

extensible metadata called "profiles"• Web Grid

– provides profile and product services over a REST-ful interface.

• XML Product/Profile handlers– provides XML-configurable, Database profile and

product handlers.

Page 20: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Product

• Provide access to data products– datasets, images, documents, or anything

with an electronic representation

• Accept standard query expressions and return zero or more matching products

• Transform products from proprietary formats and into Internet standard formats without impacting local stores or operations.

Page 21: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Profile

• Describes and Locates resources using metadata descriptions– resource's inception, composition, and

location

• Catalogs metadata descriptions and provides creating, updating, and querying capabilities.

Page 22: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Framework

• Catalog & Archive

• Utilities

• Grid

• Agility

Page 23: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Agility

• Re-implementation of Grid in Python with a focus on high performance in the face of gargantuan data sets as well as accelerated development and integration into existing systems.

Page 24: Introduction to Apache OODT Yang Li Mar 9, 2012. What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.

Questions