UMBC an Honors University in Maryland Examples of Integrating Ecological Information on the Semantic...

Post on 27-Mar-2015

217 views 1 download

Tags:

Transcript of UMBC an Honors University in Maryland Examples of Integrating Ecological Information on the Semantic...

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Examples of Integrating Ecological Information on the Semantic Web

Joel Sachs and Cynthia Simms Parrcontact: jsachs@cs.umbc.edu

Joint work with: Andriy Parafiynyk, Li Ding, Rong Pan, Lushan Han, and Tim Finin

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Overview of Talk

• Introduction and background– What are we going to integrate?

– And why?

• ELVIS– The Food Web Constructor and the Evidence Provider

• ETHAN– An ontology for evolutionary trees and natural history

• Tripleshop and Swoogle• Questions

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

An NSF ITR collaborative project with•University of Maryland, Baltimore County •University of Maryland, College Park•U. Of California, Davis•Rocky Mountain Biological Laboratory •NASA Goddard Space Flight Center•NBII

An NSF ITR collaborative project with•University of Maryland, Baltimore County •University of Maryland, College Park•U. Of California, Davis•Rocky Mountain Biological Laboratory •NASA Goddard Space Flight Center•NBII

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

An invasive species scenario

• Nile Tilapia fish have been found in a California lake.• Can this invasive species thrive in this environment?• If so, what will be the likely

consequences for theecology?

• So…we need to understandthe effects of introducingthis fish into the food webof a typical California lake

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Bacteria

Microprotozoa

Amphithoe longimana

Caprella penantis

Cymadusa compta

Lembos rectangularis

Batea catharinensis

Ostracoda

Melanitta

Tadorna tadorna

ELVIS: Ecosystem Localization, Visualization, and Information System

Oreochromis niloticusNile tilapia

??

. . .

Species list constructor

Food web constructor

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

The problem

• We have data on what species are known to be in the location and can further restrict and fill in with other ecological models

• But we don’t know which of these the Nile Tilapia eats of who might eat it.

• We can reason from taxonomic data (similar species) and known natural history data (size, mass, habitat, etc.) to fill in the gaps.

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Food Web

G

S

node

link

Evolutionary tree

step

G

taxonS

taxon

A

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Food Web ConstructorPredict food web links using database and taxonomic reasoning.

In a new estuary, Nile Tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Food Web Constructor generates possible links

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Evidence provider gives details

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Food web database

Source Webs Nodes Links

Animal Diversity Web n/a 711 2165

EcoWEB 212 4503 11967

Webs on the Web 19 1373 12056

Interaction Web DB 26 2139 9882

Tuesday Lake 2 101 510

Total 259 8827 36580

4600 distinct taxa

Food web data: Cohen 1989, Dunne et al. 2006, Vazquez 2006, Jonsson et al. 2005

Evolutionary tree: Parr et al. 2004. + plants from ITIS + hierarchy of non-taxonomic nodes

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Testing the algorithm

• Take each web out of the database• Attempt to predict its links• Compare prediction with actual data

Accuracy percentage of all predictions that are correct 89%

Precision percentage of predicted links that are correct 55%

Recall percentage of actual links that are predicted 47%

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Evolutionary distance threshold2 steps up and 4 steps down

1 2 3 4S1

S40.3

0.35

0.4

0.45

0.5

1 2 3 4S1

S30.4

0.45

0.5

0.55

0.6

steps up

steps down

precision

steps up

recall

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Evolutionary direction penalty not very sensitive

AB

XA XA YB

1

1 ( ) ( )YB

WeightDistance Penalty Distance Penalty

ancestor

descendent

siblings

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Negative evidence discount is sensitive

XY

1

( )i

Ni

i

weightCertaintyIdx LinkValue

discount

0.2

0.3

0.4

0.5

0.6

0.7

0 25 50 75 100

Negative evidence discount

Recall

Precision

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Some phyla are easier to predict than others

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Annelida

Arthro

poda

Bacill

ario

phyta

Chordata

Mollu

sca

Phylum

Re

ca

ll ra

te

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Trait space distance weighting

Euclidean distance in natural historyN-space

Parameterize functions from the literature that might predict links using characteristics of taxa. For example, size or stoichiometry.

LinkStatusAB= ƒ(α, sizeA, sizeB), ƒ(β, stoichA, stoichB) …

…need more data

How can we do better predicting links?

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Status

• Goal is ELVIS (Ecosystem Location Visualization and Information System) as an integrated set of web services for constructing food webs for a given location.

• Background ontologies

– SpireEcoConcepts: concepts and properties to represent food webs, and ELVIS related tasks, inputs and outputs

– ETHAN (Evolutionary Trees and Natural History) Concepts and properties for ‘natural history’ information on species derived from data in the Animal diversity web and other taxonomic sources

• Under development

– Connect to visualization software

– Connect to triple shop to discover more data

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

ETHANEvolutionary Trees and Natural History ontology

Animal Diversity Webhttp://www.animaldiversity.org• geographic range• habitats• physical description • reproduction• lifespan• behavior and trophic info• conservation status

“Esox lucius” hasMaxMass “1.4 kg”

“Esox lucius” isSubclassOf “Esox”

“Esox” eats “Actinopterygii”

Triples

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Triple Shop

Actinopterygii.owl

webs_publisher.php?published_study=11

Esox_lucius.owl

http://swoogle.umbc.edu

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Triple Shop What are body masses of fish that eat

fish?Enter a SPARQL query

SELECT DISTINCT ?predator ?prey ?preymaxmass ?predatormaxmass

WHERE {

?link rdf:type spec:ConfirmedFoodWebLink .

?link spec:predator ?predator .

?link spec:prey ?prey .

?predator rdfs:subClassOf ethan:Actinopterygii .

?prey rdfs:subClassOf ethan:Actinopterygii .

OPTIONAL { ?predator kw:mass_kg_high ?predatormaxmass } . OPTIONAL { ?prey kw:mass_kg_high ?preymaxmass }

}

. . . leaving out the FROM clause

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Which fish eat other fish? Show their max body masses.

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Enter query w/oFROM clause!

log in

specify dataset

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

302 RDF documents were found that might have useful data.

302 RDF documents were found that might have useful data.

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

We’ll select them all and add them to the current dataset.

We’ll select them all and add them to the current dataset.

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

We’ll run the query against this dataset to see if the results are as expected.

We’ll run the query against this dataset to see if the results are as expected.

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

The results can be produced in any of several formats

The results can be produced in any of several formats

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Results

http://sparql.cs.umbc.edu/tripleshop2/

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Looks like a useful dataset. Let’s save it and also materialize it the TS triple store.

Looks like a useful dataset. Let’s save it and also materialize it the TS triple store.

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

We can also annotate, save and share queries.

We can also annotate, save and share queries.

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Work in Progress

• There are a host of performance issues• We plan on supporting some special datasets, e.g.,

– FOAF and SPIRE data collected from Swoogle– Definitions of RDF and OWL classes and properties

from all ontologies that Swoogle has discovered• Expanding constraints to select candidate SWDs to

include arbitrary metadata and embedded queries– FROM “documents trusted by a member of the SPIRE

project”• “Qurantine” needed to handle conflicts.

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

An RDF Dictionary

• We hope to develop an RDF dictionary.• Given an RDF term, returns a graph of its definiton

– Term definition from “official” ontology– Term+URL definition from SWD at URL– Term+* union definition– Optional argument recursively adds definitions of

terms in definition excluding RDFS and OWL terms– Optional arguments identifies more namespaces to

exclude

UMBCUMBC

an Honors University in an Honors University in

MarylandMaryland

Review

• All Elvis functionality is encapsulated as web services, and all input and output is OWL based.– So Elvis integrates easily with other semantic web applications,

like the TripleShop.

• ELVIS as a platform for experimenting with different approached to food web prediction.

• TripleShop as an integrating platform• TripleShop allows researchers to semi-automatically

construct datasets in response to ad-hoc queries.• Contact jsachs@cs.umbc.edu to participate.