C ommunity In ventory of E arthCube R esources for G eoscience I nteroperability
description
Transcript of C ommunity In ventory of E arthCube R esources for G eoscience I nteroperability
Community Inventory of EarthCube Resources for Geoscience Interoperability
data discovery is the most often cited issue in executive summaries on the EarthCube web site
CINERGI
Ilya Zaslavsky, Steve Richard and the CINERGI teamhttp://workspace.earthcube.org/cinergi
Goals Large inventory of high quality information
resources across disciplines, with traceable provenance, usable across EarthCube research scenarios: datasets, catalogs, vocabularies, information models,
services, process models, repositories, etc. Make it open to the community Organize it to enable search and integration
across domains and linking between information objects Plus links between resources, people/organizations,
publications, models, workflows, software, activities, etc.
Approach Build on high-level resource inventory
started at http://connections.earthcube.org
Compile metadata for as many resources as we can (collect recommendations from geoscientists, harvest existing catalogs)
Expose through simple search interface Use off the shelf technology: Geoportal, ISO
metadata, CSW Make it accessible through EarthCube.org
READINESS ASSESSMENT 1 Catalog MetadataM1 Has a data listingM2 Uses minimal metadata standard, such as Dublin CoreM3 Uses metadata standard, such as FGDC, or INSPIRE Catalog SearchS1 Search InterfaceS2 Search API, not following a standardS3 Complies with Opensearch APIS4 Complies with OGC CSW API Catalog HarvestH1 Has a harvest APIH2 OAI APIH3 OGC CSW API
Vocabulary – Control and AccessV1 Uses controlled terminologyV2 Community Managed TerminologyV3 SPARQL Vocabulary -- RepresentationT1 Listing of terminology, such as web pagesT2 Uses ontology or SKOS
Data Access APIA1 Bulk downloadA2 Static URLA3 Web Service Data Query APIQ1 Simple query subsetQ2 Complex queryQ3 Processing Subset
Information Model ConceptualC0 UnspecifiedC1 Domain/Conceptual Model using UMLC2 Domain/Conceptual Model using UML based on OGC or ISO standards Information Model as XMLX1 XML Format. Schema may not be specifiedX2 Xml Schema Information Model as SQLS1 Provides an SQL Schema
Also evaluated: processing services; visualization services; community consensus efforts; identifier persistence
High-level inventory and readiness assessment: viewer
http://connections.earthcube.org
Staging Database
Document processing components
Harvest adapters
Public access components
Harvest adapters: components that connect to information sources and import descriptions of EarthCube resources into the staging database.
Staging Database: document database that persists the originally harvested descriptions in their native state, as well as any additional information or updates resulting from subsequent processing/curation of the description
Document processing components: components that
pull documents from the staging database, perform various functions to upgrade content or transform presentation. The processed document may be pushed back to the staging database or out to the public access components
Public access components: components that connect to document processors and implement external interfaces to present content for users
Inte
rface
s to
the
wor
ld
Resource descriptions
Ye Most Excellent EarthCube Inventory System
Then add features Links to organizations, researchers,
other systems Validation Services Deep registration of
datasets/databases (at feature level) Data search capabilities Quality/interop readiness assessment Annotation system
CINERGI Outline (without deep registration so far)
Publication
Stagingandcuration
Harvesting
Geoportal
CSW, ISO 19115ATOM, GeoRSS, etc.
Linked data RDF, RDF store, eg Neo4j
Extra metadata, provenance, links, annotations
WAF w/XML ISO
Staging DB: MDBMongoDB,CouchDBGeoportal, etc.
ISO DC other
CSW, OAI-MPH, WAF, CKAN, other
DISCO
Validated triples
1. Metadata validation per record
2. Triggering parsers depending on metadata and validation results
Spatial parser
Person /org parser
LOD parser
Keyword parser
Topic parserTime
parser
3.
4. Finding ambiguities for manual curation
Need a parser API so parsers can be added
Duplicate detection, tagging, grouping
Curation UI
Results of parsingProvenanceDuplicate flags
Search UI
Reporting to sources
Pivot for search results
Harvesting dashboard
Record editor
Community pivots Hot page
Search in domain systems
geoportal
pivotDB
Challenges Scope Different levels of granularity Lack of formal information models Implicit domain semantics Multiple metadata registry platforms and
standards Lots of data outside managed repositories Cross-domain governance vs domain
systems Different expectations across domains
(survey)
Initial inventory
http://metadata.earthcube.org
Resources from domain workshops and surveys + initial harvesting
Domain inventories: you are invited to participate! All sources of data mentioned at domain end-user
workshops – are included Working with funded RCNs
Step 1: Prepare an initial collection in a spreadsheet.Step 2: CINERGI will set up your community resource viewer and editing system, seeded with your collectionStep 3: Community editing, updates and curation
Short questionnaire
Function Importance Comments
Making metadata from your facility available for search using standard metadata, via standard APIs
1 2 3 4 5 6 7Unimportant Essential
NA DK
Tracking demand for and cross-domain usage of your resources
1 2 3 4 5 6 7Unimportant Essential
NA DK
Identifying issues related to data and metadata quality and completeness
1 2 3 4 5 6 7Unimportant Essential
NA DK
Tracking search hits that become searches for resources managed by your data facility
1 2 3 4 5 6 7Unimportant Essential
NA DK
Connecting owners of relevant datasets to your facility for potential longer-term data management
1 2 3 4 5 6 7Unimportant Essential
NA DK
Connecting data from your facility with people, publications, models, and projects
1 2 3 4 5 6 7Unimportant Essential
NA DK
Identifying communities using data, tools, and models from your facility
1 2 3 4 5 6 7Unimportant Essential
NA DK
Validating published metadata and service signatures from your facility
1 2 3 4 5 6 7Unimportant Essential
NA DK
Finding and reporting to you resources that appear as duplicates across multiple registries
1 2 3 4 5 6 7Unimportant Essential
NA DK
Potential added value by a cross-domain systemIntegration with cross-domain searchKey characteristics for CINERGI See CINERGI Survey at
http://workspace.earthcube.org/data-facilities
Development Team
San Diego Supercomputer Center/UCSD Ilya Zaslavsky, David
Valentine, Tom Whitenack Amarnath Gupta, Jeff Grethe
(NIF project) Lamont /Columbia Univ./IEDA
Kerstin Lehnert, Leslie Hsu Arizona Geological Survey
Stephen Richard University of Chicago
Tanu Malik Open Geospatial Consortium
Luis Bermudez
Community Partners
• Anthony Aufdenkampe: Critical Zone Observatories
• Shanan Peters: stratigraphy• Bernhard Peucker-
Ehrenbrink: Global River Observatories
• RCN projects that plan to organize community resources
• Test Enterprise Governance• Building Blocks projects
working on web services, brokering solutions
• Agencies• International