UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger...

27
Improving Data Catalogs Kevin O’Brien - University of Washington/JISAO, NOAA/PMEL Roland Schweitzer – Weathertop Consulting Eugene Burger – NOAA/PMEL

Transcript of UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger...

Page 1: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Improving Data Catalogs

Kevin O’Brien - University of Washington/JISAO, NOAA/PMEL

Roland Schweitzer – Weathertop Consulting

Eugene Burger – NOAA/PMEL

Page 2: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

The Unified Access Framework (UAF)

• A Global Earth Observation Integrated Data Environment (GEO-IDE) project

• An attempt to improve scientific data management and access

• Focus on successes

Page 3: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Lots of data already available

Page 4: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Projects: (too many to name)

Dataformats:

netCDF GRIB ASCII

Applications: Matlab ArcGIS Ferret

GrADS Google Earth IDV LAS ERDDAP …

Users: (too many to name)

netCDF-CF-DAP-THREDDS-WMS

Page 5: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Developing the UAF Catalog Cleaner

(a ‘web crawler’)N

OM

ADS

UAF ‘RAW’ catalog

NOAA NOAA Affiliated

NMFSOAR NWS NESDIS

NO

DC

NG

DC

GFD

L

PMEL

AOM

LO

CO

PFEG

ND

BC

ESRL

Coas

twat

ch

IOOS National Partners

IOOS Regional Partners

NAV

O

AOO

S

NAN

OO

S

CEN

COO

S SCCO

OS

PACI

OO

SG

LOS

NER

ACO

OS

MAC

OO

RA SECO

ORA

CARI

COO

S GCO

OS

NO

MAD

S

UAF ‘CLEAN’ catalog

NOAA NOAA Affiliated

NMFSOAR NWS NESDIS

NO

DC

NG

DC

GFD

L

PMEL

AOM

LO

CO

PFEG

ND

BC

ESRL

Coas

twat

ch

IOOS National Partners

IOOS Regional Partners

NAV

O

AOO

S

NAN

OO

S

CEN

COO

S SCCO

OS

PACI

OO

SG

LOS

NER

ACO

OS

MAC

OO

RA SECO

ORA

CARI

COO

S GCO

OS

‘RAW’

‘CLEAN’

Page 6: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Tree Crawl Dataset Crawl Cleaner

CatalogRef and

Dataset URL’s

Raw catalog XML

Page 7: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Tree Crawl Dataset Crawl Cleaner

url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/OCEAN_GEOSTROPHIC_CURRENTS/CURRENTS.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/GLOBAL_MONTHLY_CARBON_FLUXES/FLUXES.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/GLOBAL_SEASON_CARBON_FLUXES/FLUXES.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/ROMSMETEO/kk1.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/MCI_GULF/kk1.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/MSGSST/SST.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/TERRA_K490_GULF/terrak490.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/TERRA_K490_GULF_3D/terrak490.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.199910.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.199911.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.199912.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200001.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200002.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200003.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200004.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200005.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200006.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200007.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200008.nc".

CatalogRef and

Dataset URL’s

Page 8: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Tree Crawl Dataset Crawl Cleaner

Aggregations

CF complianc

e

Access services

UAF Clean Catalog

Page 9: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

UAF Clean Catalog

Page 10: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

How to provide feedback to data providers?

•Remember the “Building on Success” theme

• ncISO metadata assessment tool is very successful

Page 11: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.
Page 12: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.
Page 13: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

How about a catalog quality assessment tool?

How to provide feedback to data providers?

•Remember the “Building on Success” theme

• ncISO metadata assessment tool is very successful

Page 14: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.
Page 15: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.
Page 16: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Statistics for current catalog and all it’s children

Links to rubric reports for child catalogs

Page 17: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Missing services

Data issues

Page 18: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

url url

url

url url

url

url url

Page 19: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Data issues

Original Catalog

Page 20: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

1. Crawl a collection of catalogs and find all of the OPeNDAP end points.

2. Examine each end point and determine if it has gridded CF compliant netCDF data.

The catalog cleaner can...

Page 21: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

1. Report problems:a. No grids found that follow CFb. Unordered time axisc. Data access errors (underlying files missing, mis-

configured gateways, etc.)2. Detect unaggregated time series data3. Detect missing services

The catalog cleaner can...

Page 22: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

1. Write a new catalog with remote links to the data and with local versions of missing services.

The catalog cleaner can...

Page 23: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

but shouldn’t…

1. Construct an aggregation to run locally accessing remote data via OPeNDAP.

The catalog cleaner can…

Page 24: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

1. Unacceptably poor data access performance.

2. No access to the local file system, so it cannot make a catalog that would aggregate the files via configuration pointing to the local file system.

Why not...

Page 25: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

1. Use a modified version of the tool to assess the quality of a local catalog.

IE: CatalogCleaner CatalogEvaluator

2. Do the (not difficult) work locally to aggregate files where appropriate and turn on missing services.

What to do...

Page 26: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Moving Forward….

• Welcome feedback on rubric and Catalog Cleaner tool

• Evolution of tool to an evaluation tool

• UAF master catalog to go beyond gridded files• Use ERDDAP to including In Situ featureTypes• Building support for visualization of these in LAS

• Continue community outreach to improve catalogs

Page 27: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.

Thank you!UAF: geo-ide.noaa.govCatalog Cleaner code and documentation:

http://ferret.pmel.noaa.gov/LAS/documentation/the-uaf-catalog-cleaner/ERDDAP: upwell.pfeg.noaa.gov/erddapTHREDDS: www.unidata.ucar.edu/projects/THREDDSnetCDF: www.unidata.ucar.edu/netcdfOPeNDAP: www.opendap.orgCF: cf-pcmdi.llnl.gov