Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center...

41
Science Environment for Science Environment for Ecological Knowledge Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego http://seek.ecoinformatics.org UC Santa Barbara UC San Diego U New Mexico U Kansas Vermont, Napier, ASU, UNC

Transcript of Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center...

Page 1: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

Science Environment Science Environment for Ecological for Ecological Knowledge Knowledge

Science Environment Science Environment for Ecological for Ecological Knowledge Knowledge

Bertram Ludäscher

San Diego Supercomputer CenterUniversity of California, San Diego

http://seek.ecoinformatics.org

UC Santa Barbara

UC San Diego

U New Mexico

U Kansas

Vermont, Napier, ASU, UNC

Page 2: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 2

Architecture Overview Architecture Overview

• Analysis & Modeling System– Design and execution of

ecological models and analysis

– End user focus– application-/upperware

• Semantic Mediation System– Data Integration of hard-

to-relate sources and processes

– Semantic Types and Ontologies

– upper middleware• EcoGrid

– Access to ecology data and tools

– middle-/underware

•Plus Working Groups:

– Knowledge Representation (SEEK-KR)

– Classification and Nomenclature (TAXON)

– Biodiversity and Ecological Analysis and Modeling (BEAM)

(cf. GEON + Cyberinfrastructure)

Page 3: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 3

SEEK EcoGridSEEK EcoGrid

• Goal: standardize interfaces (using web and grid services)– We have standardized data via EML– Integrate diverse data networks from ecology, biodiversity, and

environmental sciences

• Grid-standardized interfaces– Uniform interface to:

• Metacat, SRB, DiGIR, Xanthoria, etc.• Anyone can implement these interfaces• Hides complexity of underlying systems

• Metadata-mediated data access– Supports multiple metadata standards– EML, Darwin Core as foci

• Computational services– Pre-defined analytical services– On-the-fly analytical services

Page 4: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 4

Grid versus Web ServicesGrid versus Web Services

• Grid Services are Web Services– Add authentication, lifecycle management, notification, etc.– Globus Toolkit 3: Implements Open Grid Services Architecture

(OGSA)

• Implications for use– Write a normal web service extending GridService base class– When deployed within GT3, you get these extra functions for

‘free’– Supports distributed computation via proxy authentication

• Problems– Complex system to understand– GT3 can be difficult to deploy– Proposals to incorporate grid services within the Web services

community (Web Services Resource Framework [WSRF])

Page 5: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 5

EcoGrid client interactionsEcoGrid client interactions

• Modes of interaction– Client-server– Fully distributed– Peer-to-peer

• EcoGrid Registry– Node discovery– Service discovery

• Aggregation services– Centralized access– Reliability– Data preservation

Page 6: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 6

Building the EcoGridBuilding the EcoGrid

AND

LUQ

HBR

NTL

Metacat node

Legacy system

LTER Network (24) Natural History Collections (>> 100)Organization of Biological Field Stations (180)UC Natural Reserve System (36)Partnership for Interdisciplinary Studies of Coastal Oceans (4)Multi-agency Rocky Intertidal Network (60)

SRB node

DiGIR node

VCR

VegBank node

Xanthoria node

Page 7: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 7

Kepler: Scientific WorkflowsKepler: Scientific Workflows

EML provides semi-automated data binding

Scientific workflows represent knowledge about the process; Kepler captures this knowledge

Query EcoGrid to find data

Archive output to EcoGrid

Page 8: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 8

GARP Invasive Species ModelGARP Invasive Species Model

Training sample (d)

GARPrule set (e)

Test sample (d)

Integrated layers

(native range) (c)

DiGIRSpecies

presence &absence points(native range)

(a)

EcoGridQuery

EcoGridQuery

LayerIntegration

LayerIntegration

Sample

+A3+A2

+A1

DataCalculation

Map Validation

User

ValidationMap

SRBEnvironmental layers (invasion

area) (b)

Integrated layers

(invasion area) (c)

Invasionarea

prediction map (f)

DiGIR Species presence &absence points

(invasion area) (a)

Native range

predictionmap (f)

Model qualityparameter (g)

SRBEnvironmental layers (native

range) (b)

Model qualityparameter (g)

Slide from D. Pennington

Scientific workflows represent knowledge about the process; AMS captures this knowledge

Page 9: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 9

Kepler Team, Projects, Sponsors Kepler Team, Projects, Sponsors

• Ilkay Altintas SDM • Chad Berkley SEEK • Shawn Bowers SEEK• Jeffrey Grethe BIRN• Christopher H. Brooks Ptolemy II • Zhengang Cheng SDM • Efrat Jaeger GEON • Matt Jones SEEK • Edward A. Lee Ptolemy II • Kai Lin GEON• Bertram Ludäscher BIRN, GEON, SDM, SEEK• Steve Mock NMI• Steve Neuendorffer Ptolemy II • Jing Tao SEEK• Mladen Vouk SDM • Yang Zhao Ptolemy II • …

Ptolemy IIPtolemy II

                                                

                                            

Page 10: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 10

Kepler Understands EML Data Kepler Understands EML Data (Chad Berkley, SEEK)(Chad Berkley, SEEK)

Page 11: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 11

Kepler: Ecological ModelingKepler: Ecological Modeling(Chad Berkley, SEEK)(Chad Berkley, SEEK)

Page 12: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 12

Database Access Database Access (Efrat Jaeger, GEON)(Efrat Jaeger, GEON)

Note: EML descriptions of relational sources would allow automated data ingestion

Page 13: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 13

Mineral Classification with Kepler … Mineral Classification with Kepler … (Efrat Jaeger, GEON)(Efrat Jaeger, GEON)

Page 14: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 14

… … inside the Classifierinside the Classifier

Page 15: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 15

Standard BrowserUI: Client-Side Standard BrowserUI: Client-Side SVGSVG

Page 16: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 16

SWF Reengineering SWF Reengineering (Ilkay, SDM; Ashraf, Efrat, Kai, GEON)(Ilkay, SDM; Ashraf, Efrat, Kai, GEON)

Page 17: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 17

DataMapper Sub-WorkflowDataMapper Sub-Workflow

Page 18: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 18

Result launched via BrowserUI Result launched via BrowserUI actoractor

(coupling with ESRI’s ArcIMS)(coupling with ESRI’s ArcIMS)

Page 19: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 19

Distributed Workflows in Distributed Workflows in KEPLERKEPLER

• Web and Grid Service plug-ins– WSDL (now) and Grid services (stay tuned …)– ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard– SSH, SCP, SDSC SRB, OGS?-???… coming

• WS Harvester– Import query-defined WS operations as Kepler actors

• XSLT and XQuery Data Transformers– to link not “designed-to-fit” web services

• WS-deployment interface (planned)

Page 20: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 20

Web Service Actor Web Service Actor (Ilkay Altintas, (Ilkay Altintas, SDM)SDM)

Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method.

Configure - select service operation

Page 21: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 21

Set Parameters and CommitSet Parameters and Commit

Set parameters and commit

Page 22: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 22

SpecializedSpecialized WS Actor WS Actor (after instantiation)(after instantiation)

Page 23: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 23

Web Service Web Service Harvester Harvester (Ilkay Altintas, SDM)(Ilkay Altintas, SDM)

• Imports the web services in a repository into the actor library.• Has the capability to search for web services based on a keyword.

Page 24: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 24

Kepler: Grid Services AccessKepler: Grid Services Access(Steve Mock, NMI)(Steve Mock, NMI)

Page 25: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 25

An (oversimplified) An (oversimplified) Model of the Model of the GridGrid

• Hosts: {h1, h2, h3, …}

• Data@Hosts: d1@{hi}, d2@{hj}, …

• Functions@Hosts: f1@{hi}, f2@{hj}, …

• Given: data/workflow:• … as a functional plan: […; Y := f(X); Z := g(Y); …] • … as a logic plan: […; f(X,Y)g(Y,Z); …]

• Find Host Assignment: di hi , fj hj for all di , fj

… s.t. […; d3@h3 := f@h2(d1@h1), …] is a valid plan

f gX Y Z

Page 26: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 26

Shipping & Handling Algebra Shipping & Handling Algebra (SHA)(SHA)

f@a

x@b y@c

f@a

x@b y@c

f@a

x@b y@c

f@a

x@b y@c

plan Y@C = F@A of X@B =

1. [ X@B to A, Y@A := F@A(X@A), Y@A to C ]

2. [ F@A => B, Y@B := F@B(X@B), Y@B to C ]

3. [ X@B to C, F@A => C, Y@C := F@C(X@C) ]

Logical view

Physical view: SHA Plans

(1)

(3)

(2)

Page 27: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 27

Grid-Enabling PTII: Grid-Enabling PTII: HandlesHandles

A B

GA GB

1. AGA: get_handle2. GAA: return &X3. AB: send &X4. BGB: request &X5. GBGA: request &X6. GA GB: send *X7. GBB: send done(&X)

Example: &X = “GA.17”

*X =<some_huge_file>

Candidate Formalisms:• GridFTP• SSH, SCP• SDSC SRB• OGS?-??? … WSRF?

1 2

3

4

5

6

7

Kepler space

Grid space

Logical token transfer (3) requires get_handle(1,2); then exec_handle(4,5,6,7) for completion.

Page 28: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 28

Homogeneous Data IntegrationHomogeneous Data Integration

• Integration of homogeneous or mostly homogeneous data via EML metadata is relatively straightforward

Page 29: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 29

Heterogeneous Data Heterogeneous Data integrationintegration

• Requires advanced metadata and processing

– Attributes must be semantically typed– Collection protocols must be known– Units and measurement scale must be known– Measurement relationships must be known

• e.g., that ArealDensity=Count/Area

Page 30: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 30

• Label data with semantic types• Label inputs and outputs of analytical components with

semantic types

• Use reasoning engines to generate transformation steps– Beware analytical constraints

• Use reasoning engine to discover relevant components

Semantic MediationSemantic Mediation

Data Ontology Workflow Components

Page 31: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 31

Ecological ontologiesEcological ontologies

• What was measured (e.g., biomass)• Type of measurement (e.g., Energy)• Context of measurement (e.g., Psychotria limonensis)• How it was measured (e.g., dry weight)

• SEEK intends to enable community-created ecological ontologies using OWL– Represents a controlled vocabulary for ecological metadata

Page 32: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 32

ExtensionsExtensions: Semantic Types: Semantic Types

• Take concepts and relationships from an ontology to “semantically type” the data-in/out ports

• Application: e.g., design support: – smart/semi-automatic wiring, generation of “massaging

actors”

m1

(normalize)p3 p4

Takes Abundance Count

Measurements for Life StagesReturns Mortality Rate Derived

Measurements for Life Stages

Page 33: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 33

Page 34: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 34

Page 35: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 35

Semantic TypesSemantic Types

• The semantic type signature– Type expressions over the (OWL) ontology

m1

(normalize)p3 p4

SemType m1 ::

Observation & itemMeasured.AbundanceCount &

hasContext.appliesTo.LifeStageProperty

->

DerivedObservation & itemMeasured.MortalityRate &

hasContext.appliesTo.LifeStageProperty

Page 36: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 36

Extended Type System Extended Type System (here: OWL (here: OWL Semantic Types)Semantic Types)

SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStagePropertySubstructure association:

XML raw-data =(X)Query=> object model =link => OWL ontology

Page 37: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 37

Semantic Types for Scientific Semantic Types for Scientific WorkflowsWorkflows

Page 38: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 38

Deriving Data Transformations Deriving Data Transformations from Semantic Service from Semantic Service

RegistrationRegistration

[Bowers-Ludaescher,DILS’04]

Page 39: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 39

Structural and Semantic MappingsStructural and Semantic Mappings

[Bowers-Ludaescher,DILS’04]

Page 40: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 40

• Fundamental improvements for researchers

– Global access to ecologically relevant data– Rapidly locate and utilize distributed computation– Capture, reproduce, extend analysis process

SEEK ImpactSEEK Impact

Page 41: Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .

SEEK Overview, 3/2004 41

AcknowledgementsAcknowledgements

This material is based upon work supported by:

The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676.

PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research)

Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON