SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental...

16
SEEK EcoGrid Integrate diverse data networks from ecology, biodiversity, and environmental sciences Metacat, DiGIR, SRB, Xanthoria, ... EML is the core for data documentation Open programming interface

Transcript of SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental...

Page 1: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

SEEK EcoGrid

Integrate diverse data networks from ecology, biodiversity, and environmental sciences

Metacat, DiGIR, SRB, Xanthoria, ... EML is the core for data documentation Open programming interface

Page 2: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

EcoGrid client interactions

Page 3: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

Aims of EcoGrid Which, Where, How, Who ???? Share Data and Information Relate Data from multiple projects/groups Crosswalks across data structures Develop Eco-related Finding Aids for Data Global User: Authenticate and Authorize Provide an infrastructure for “Archivable

Collection-building” for SEEK scientists Facilitate the A&M layer and the SMS

layer

Page 4: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

Challenges of EcoGrid Data & User Diversity

6000+ datasets & 1500+ scientists themes, methods, units,structures Small data sizes but high complexity - metadata

Multiple Data Organizations Biodiversity Surveys Population data GIS, Satellite Images, Weather Data, …

Ontologies & Taxonomies Data Discovery: No single place to find Data Entropy – rapid decline of information on data Autonomy with Centralized access Leverage Computational Grid work

Page 5: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

Existing services Metacat – syntactic and semantic metadata

querying/inserting/updating/deleting, user registration/authentication, data replication, data/metadata versioning, - supports any XML-based metadata

Xanthoria – common-schema mediator (currently 8 sites) metadata query/insert/update/delete for any XML schema to underlying metadatabase (SQL, native XML)

Page 6: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

Existing Systems

DiGIR – querying arbitrary XML-describable resources (underlying data sources can be any type: RDB, XMLDB).

ClimDB – integrating (using wrapping at the data source) diverse format climate data. Access through web, common schema identified beforehand – tabular description

HyperLTER – summary ontology as metadata for images put in as metadata, image extraction /geographicsubsetting/band-level subsetting/ - integration with MODIS images and Hyperspectral images, TM images, airphotos, …

Page 7: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

Existing Systems

VegBank – 3 databases co-occurrence records, species taxonomic database that is concept-driven, community classification. Distributed vegbank, querying by plots. Querying/insert/update/annotate across three diverse databases that are described using XML

SRB – access distributed data, syntactic, semantics,user-defined (arbitrary relational) metadata based querying. Annotations for data. Opertions on data. Extraction of metadata. ingest,bulk ingest, delete,upate of data/metadata

Page 8: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

EcoGrid Services

Query Search metadata and data, return result sets with ID

Read Retrieve data objects by ID

Authentication Verify user identity

Authorization Record allowable interactions

Write Write data objects by ID

Replication Mirror objects for backup and efficiency

Computation Execute models and simulations from AMS on various nodes

Page 9: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

EcoGrid Search Interactions

Features Well-defined interfaces (e.g., WSDL) Standardized messaging formats Automated discovery of implementing services Aggregation/Indexing across nodes for efficiency Support heterogeneous data objects via metadata descriptions Lightweight to implement for various systems like DiGIR and

Metacat

Client

Registry

QueryServiceQueryServiceQueryServiceQueryServiceQueryServiceQueryService

1. Register2. Find Query Nodes

3. Search (recursive)

Page 10: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

4. Read (recursive)

5. Find Index Nodes1. Register

EcoGrid Index Interactions

Client

Registry

QueryServiceQueryServiceQueryServiceQueryServiceQueryServiceQueryService

3. Search (recursive)

IndexedQueryService

6. Search

2. Find Query Nodes

Page 11: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

Authentication and Authorization

KNB uses simple LDAP system with referrals Leverages existing DB (e.g. LTER personnel DB) Not really scalable in terms of administration

Grid Security Infrastructure (GSI) Certificate based authentication Proxy certificates allows transfer of rights De-centralized administration (I.e., multiple CA’s)

Can we easily transition to GSI?

Page 12: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

Native Range prediction workflow

Slide from D. Pennington

KNBAbundance

Data(a1)

Training sample (d)

GARPrule set (e)

Test sample (d)

Integrated layers

(native range) (c)

DiGIRSpecies

presence &absence points

(a2)EcoGridQuery

EcoGridQuery

LayerIntegration

Sample

+A3+A2

+A1

DataCalculation

Map Validation

User

Model qualityparameter (g)

Native range prediction

map (f)

SRBEnvironmental

layers (b)

EcoGridQuery

EcoGrid

Archive

Page 13: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

Implementation

Short-term Define common WSDL services Simple service registry Wrappers for Metacat, DiGIR, SRB, Xanthoria, etc.

Medium-term Use OGSI-compliant interfaces

(add methods to current WSDL) Grid Registry service

Page 14: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

Timing

April 4 April 11 -- Design Diagrams April 18 -- WSDL, Registry instance operational, query + read, RSIDS

schema and examples. April 25 May 2 May 9 Wrapper implementations + test client(s) May 16 (SEEK Technical WG meeting) May 23 May 30 -- Hard deadline for implementation of Eco-GRID alpha 1

Page 15: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

Query Messages

<egq:query queryId="test.1.1" system="test" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0alpha1">

<namespace prefix="eml" space="eml://ecoinformatics.org/eml-2.0.0"/> <title>Soils metadata query</title> <AND> <OR> <condition operator="LIKE" concept="eml:title">%soil%</condition> <condition operator="LIKE" concept="eml:title">%dirt%</condition> </OR> <OR> <condition operator="LIKE" concept="eml:surName">%Jones%</condition> <condition operator="LIKE" concept="eml:surName">%Vieglais%</condition> </OR> </AND></egq:query>

Page 16: SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.

Result responses<rs:resultset resultsetId="foo.1.1" system="http://knb.ecoinformatics.org/knb/" xmlns:rs='ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0alpha1'> <resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <recordCount>86</recordCount> </resultsetMetadata> <records startRecord="1" endRecord="1" xmlns:eml='eml://ecoinformatics.org/eml-2.0.0'> <record number="1" identifier="bar.1.23"> <eml:eml packageId="bar.1.23"> <title>Soil data from West Valley, 1983</title> <creator> <individualName><surName>Jones</surName></individualName> </creator> <creator> <individualName><surName>Smith</surName></individualName> </creator> <keywordSet> <keyword>aves</keyword> <keyword>ornithology</keyword> <keyword>biodiversity</keyword> </keywordSet> </eml:eml> </record> </records></rs:resultset>