DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving?...

19
DMAC Data DMAC Data Integration Integration What is it really? What is it really? Why does it seem frozen in Why does it seem frozen in place? place? How do we get it moving? How do we get it moving? Steve Hankin (NOAA/PMEL) Steve Hankin (NOAA/PMEL) DMAC = Data Management and Communications DMAC = Data Management and Communications subsystem of the US Integrated Ocean Observing subsystem of the US Integrated Ocean Observing System (IIOS) System (IIOS) [ ]

Transcript of DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving?...

Page 1: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

DMAC Data IntegrationDMAC Data Integration

What is it really?What is it really?

Why does it seem frozen in place?Why does it seem frozen in place?

How do we get it moving? How do we get it moving?

Steve Hankin (NOAA/PMEL)Steve Hankin (NOAA/PMEL)

DMAC = Data Management and CommunicationsDMAC = Data Management and Communicationssubsystem of the US Integrated Ocean Observing System (IIOS)subsystem of the US Integrated Ocean Observing System (IIOS)

[ ]

Page 2: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 22

Part 1. A Short DigressionPart 1. A Short Digression(begging your indulgence …)(begging your indulgence …)

What’s new in theWhat’s new in theObserving System Monitoring Center (OSMC)Observing System Monitoring Center (OSMC)

Page 3: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 33

Page 4: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 44

Page 5: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 55

under the hood …under the hood …

Metadata feeds from NOAAPort & GODAEMetadata feeds from NOAAPort & GODAE

GODAE QC fields to be added next …GODAE QC fields to be added next …

A feed from NCEP ?A feed from NCEP ?

Goal: Goal: – Compare QC strategies.Compare QC strategies.– Compare GTS filters and feeds.Compare GTS filters and feeds.

Page 6: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 66

Part 2. DMAC Data IntegrationPart 2. DMAC Data Integration(DMAC = Data Management and Communications subsystem of IOOS)(DMAC = Data Management and Communications subsystem of IOOS)

Just what is DMAC “data integration” ? Just what is DMAC “data integration” ? (and what is it not ?) (and what is it not ?)

Start with a taxonomy thru examples …Start with a taxonomy thru examples …

What is it really?What is it really?

Why does it seem frozen in place?Why does it seem frozen in place?

How do we get it moving? How do we get it moving?

Page 7: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 1313

An analogy: the electric power gridAn analogy: the electric power grid

Energy goes in. Energy comes out.Energy goes in. Energy comes out.Providers do not target specific consumers.Providers do not target specific consumers.They just adhere to standards (60Hz).They just adhere to standards (60Hz).

Consumers are not aware of specific providersConsumers are not aware of specific providers..

Analogy appears simplistic until you refine your Analogy appears simplistic until you refine your concept of data. concept of data. Data must always be tightly Data must always be tightly bound to its metadata.bound to its metadata.

DMAC integration is a “data grid”DMAC integration is a “data grid”

The concept of “integration” in DMACThe concept of “integration” in DMAC

Analogy is simplistic?Analogy is simplistic?

Page 8: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 1414

The DMAC Plan (2004) is built The DMAC Plan (2004) is built around a “data grid” conceptaround a “data grid” concept

(a.k.a. “data commons”) (a.k.a. “data commons”)

Uniform services (standards)Uniform services (standards)– to interconnect existing systemsto interconnect existing systems

““Do no Harm”Do no Harm”

Existing standards are inadequateExisting standards are inadequate An implementation plan, An implementation plan,not a specificationnot a specification

240 pages

How far have we progressed?How far have we progressed?

Page 9: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 1515

Honest answer: Honest answer: barely at allbarely at all..

Why?Why?

1.1. Formulation choices in the DMAC PlanFormulation choices in the DMAC Plan

2.2. Political chaosPolitical chaos

3.3. Community social structureCommunity social structure

How do we overcome each of these obstacles?How do we overcome each of these obstacles?

How far has DMAC progressed since 2004?How far has DMAC progressed since 2004?

Page 10: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 1616

DMAC Plan has detailed milestonesDMAC Plan has detailed milestonesBut they are not sufficiently tangibleBut they are not sufficiently tangible– e.g. “publish a community standard for [xxx]”.– e.g. “publish a community standard for [xxx]”.

Solution: Reformulate the Plan as a sequence of Solution: Reformulate the Plan as a sequence of tasks that each provide tangible benefits.tasks that each provide tangible benefits.

Obstacle 1: Obstacle 1: Formulation choices in the planFormulation choices in the plan

Page 11: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 1717

Dumb, bad luck timing (post 9/11)Dumb, bad luck timing (post 9/11) & &Interagency coordination failuresInteragency coordination failures

lead tolead to

Negligible direct fundingNegligible direct funding(just enough for “volunteer” meetings)(just enough for “volunteer” meetings)

(Note: millions have been made available that (Note: millions have been made available that generated additional demand for DMAC guidance)generated additional demand for DMAC guidance)

Solution: Better marketing. Map out a Plan that Solution: Better marketing. Map out a Plan that can be marketed to Gov’t managerscan be marketed to Gov’t managers

Obstacle 2: Obstacle 2: Political chaosPolitical chaos

Page 12: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 1818

Obstacle 3: Obstacle 3: Community social structureCommunity social structure

The diminutive nation of The diminutive nation of Science Data Management Science Data Management lies nestled among three lies nestled among three neighbors:neighbors:

1.1. IT InfrastructureIT Infrastructure2.2. Computer ScienceComputer Science3.3. Science ResearchScience Research

Each is larger and more Each is larger and more powerful and imposes its powerful and imposes its viewpoint on our small viewpoint on our small nation.nation.

Science Research

Computer Science

IT Infrastructure

DataMgmt

Page 13: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 2121

Obstacle 3: Obstacle 3: Community social structureCommunity social structure

3. Science/Research viewpoint:3. Science/Research viewpoint:

““Reduce complexity by limiting the number of Reduce complexity by limiting the number of variables to be considered initiallyvariables to be considered initially.”.” But data management challenges are largely But data management challenges are largely independent of data content. independent of data content.

Analogy: would it reduce complexity in designing an ocean Analogy: would it reduce complexity in designing an ocean glider if it only had to measure temperature?glider if it only had to measure temperature?

Data management simplifies by reducing the Data management simplifies by reducing the number of data number of data structuresstructures (a.k.a. “data models”).(a.k.a. “data models”).

Page 14: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 2323

Proposal: Build the DMAC integration framework as a Proposal: Build the DMAC integration framework as a collection of Virtual Data Assembly Centerscollection of Virtual Data Assembly Centers(“V-DACs”) (“V-DACs”) by data structure.by data structure.

To be developed one-by-one:To be developed one-by-one:

1.1. Grids (models, satellites, climatologies)Grids (models, satellites, climatologies)2.2. Time seriesTime series3.3. Surface TracksSurface Tracks4.4. Vertical Profiles and SectionsVertical Profiles and Sections5.5. ……, Scatters, Swaths, Radials, Polygons, … , Scatters, Swaths, Radials, Polygons, …

Page 15: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 2525

time series protocol

Time series V-DAC

Meta-data

TAO BATS

OceanSites

U. Hawaii Sea Level Center

NDBC

NODC

• bricks-and-mortar time series “curator” (funded)

• standard protocol(s) (“web services”)

• one access point

• multiple variables

Imagine the V-DAC for time series data Imagine the V-DAC for time series data

Page 16: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 2626

also fund a metadata development activity:also fund a metadata development activity:– Data discoveryData discovery– Controlled vocabulariesControlled vocabularies– Data lineageData lineage– Geo-referencingGeo-referencing– Instrument characterizationsInstrument characterizations– Quality control Quality control

Page 17: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

How do we build an ocean temperature V-DAC? How do we build an ocean temperature V-DAC?

Time series V-DAC

Meta-data

Profiles V-DAC

Meta-data

Grids V-DAC

Meta-data

Temperature V-DAC

Meta-data

A single place to access all ocean temperature data A single place to access all ocean temperature data

Page 18: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 2828

The virtues of this approach:The virtues of this approach:

Reductionism: One protocol at a timeReductionism: One protocol at a timeA concrete deliverable at every stepA concrete deliverable at every stepUnites communities of interest (integration)Unites communities of interest (integration)

But can we market the idea to management?But can we market the idea to management?(Who has the ability to carry the message to management?)(Who has the ability to carry the message to management?)

The science community has a strong voice.The science community has a strong voice.(Much stronger than DM.)(Much stronger than DM.)

Page 19: DMAC Data Integration What is it really? Why does it seem frozen in place? How do we get it moving? Steve Hankin (NOAA/PMEL) DMAC = Data Management and.

June '07June '07 OCO Annual ReviewOCO Annual Review 2929

DiscussionDiscussion(Thank you)(Thank you)