Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

21
1 Semantics Technology Demonstration Ecoinformatics International Technical Collaboration April 9, 2008 Research Triangle Park, North Carolina, USA Bruce Bargmeyer Lawrence Berkeley National Laboratory and University of California, Berkeley Tel: +1 510-495-2905 [email protected]

description

Semantics Technology Demonstration Ecoinformatics International Technical Collaboration  April 9, 2008 Research Triangle Park, North Carolina, USA. Bruce Bargmeyer Lawrence Berkeley National Laboratory and University of California, Berkeley Tel: +1 510-495-2905 [email protected]. - PowerPoint PPT Presentation

Transcript of Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Page 1: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

1

Semantics Technology DemonstrationEcoinformatics International Technical Collaboration 

April 9, 2008

Research Triangle Park, North Carolina, USA

Bruce BargmeyerLawrence Berkeley National LaboratoryandUniversity of California, BerkeleyTel: +1 [email protected]

Page 2: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Topics

Describe challenges to be addressed Describe the demo scenarios Describe the initial demo Describe the technology/infrastructure Discuss Collaboration

2

Page 3: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

3

Challenge: Access Dispersed Data. Convey Common Understanding of meaning between

Data Creators and Data Users

Users Information systems

Data Creation

UsersUsers

EEA

USGS

DoD

EPAenvironagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text

ambienteagriculturatiemposalud hunanoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

Others . . .

ambienteagriculturatiemposalud hunoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038

3268082513485038270800002178

text data

A common interpretation of what the data represents

Page 4: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

4

Challenge: Combine Data, Metadata & Concept Systems

ID Date Temp Hg

A 06-09-13 4.4 4

B 06-09-13 9.3 2

X 06-09-13 6.7 78

Name Datatype Definition Units

ID textMonitoring Station Identifier

not applicable

Date date Date yy-mm-dd

Temp numberTemperature (to 0.1 degree C)

degrees Celcius

Hg numberMercury contamination

micrograms per liter

Inference Search Query:“find water bodies downstream from Fletcher Creek where chemical contamination was over 10 micrograms per liter between December 2001 and March 2003”

Data:

Metadata:

Biological Radioactive

Contamination

lead cadmiummercury

Chemical

Concept system:

Page 5: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

5

Challenge: Use data from systems that record the same facts with different terms

Reduce the human toil of drawing information together and performing analysis -> shift to computer processing.

Page 6: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

6

Data Elements

DZ

BE

CN

DK

EG

FR

. . .

ZW

ISO 3166English Name

ISO 31663-Numeric Code

012

056

156

208

818

250

. . .

716

ISO 31662-Alpha Code

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

Name:Context:Definition:Unique ID: 4572Value Domain:Maintenance Org.Steward:Classification:Registration Authority:Others

ISO 3166French Name

L`Algérie

Belgique

Chine

Danemark

Egypte

La France

. . .

Zimbabwe

DZA

BEL

CHN

DNK

EGY

FRA

. . .

ZWE

ISO 31663-Alpha Code

Same Fact, Different Terms

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

Name: Country IdentifiersContext:Definition:Unique ID: 5769Conceptual Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

DataElementConcept

Page 7: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Demo with Microsoft eScience

Collaborate with Microsoft Research, San Francisco Office

Collaboration already ongoing with LBNL and UCB, Berkeley Water Center.

Somewhat like Hydroseek, but with XMDR for concept systems and metadata

Hydroseek accesses EPA STORET and USGS NWIS (Water Data)

7

Page 8: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Scenarios

Scenario 1 – Semantics enabled data access Semantics enabled access to data and metadata

that may serve as an indicator (or as input to a more complex indicator)

Scenario 2 – Data harmonization People from different states or countries (political

jurisdictions) are interested in water quality. They want to develop a particular indicator of interest based on data that crosses political jurisdictions.

8

Page 9: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Scenarios (Continued)

Scenario 3 – Simulation models Use XMDR to document parameters: input

data, output data, initialization parameters, etc. for water, air, subsurface, models. So as to support remote simulation model integration. If put a box around some geography, can see if there are models that have been run.

9

Page 10: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Scenario 1

Semantics enabled access to data and metadata that may serve as an indicator (or as input to a more complex indicator)

Person uses concept systems to find variables of interest, accesses the data for the variables, and views metadata describing the data.

Use concept system to identify possible variables that have data for a specific time and geographic coverage.

Use concept system to create query to access data from multiple sources. Access/obtain the data System performs mediation of results from different result sets (simple

transformations based on information in metadata registry). Display data with links to metadata. User can go to metadata to better understand the data, e.g., provenance,

measurement units, collection methods.

10

Page 11: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Scenario 1

Use combination of XMDR and Hydroseek-like software XMDR holds the concept system(s) and metadata Hydroseek-like software interacts with user, accesses data, and

displays results. Mediation tool is separate from XMDR, but draws on metadata

from XMDR. Also need what is necessary to interact with the external data

source (e.g., screen scraping, database access). Bora currently has concept system that serves as the global

ontology for variables in ~25 systems. E.g., STORET and NWIS. He used USGS water words dictionary.

11

Page 12: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Hydroseek

Hydroseek is an ontology-aided search engine for finding scientific data on water quality and hydrology from approximately 1.9 million sites in the USA. Hydroseek creates a unified view over databases of agencies such us US Geological Survey, Environmental Protection Agency.

It helps researchers to remove the semantic, syntactic and information system heterogeneity barriers, improves the search experience, and reduces time spent on data discovery and preparation prior to processing. Depending on the method of interaction (GUI or web services) and the function invoked, output can be provided using CUAHSI WaterML, Geography Markup Languages Features, or Microsoft Excel.

The system uses Microsoft Virtual Earth map interface with OWL ontologies providing the knowledge base used in supplying the auto-complete keywords and classifying of search results.

Hydroseek follows Services Oriented Architecture (SOA) and most functionalities are available via SOAP webservices. The system also supports queries using NASA's Global Change Master Directory (GCMD) keywords via web services.

12

Page 13: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Hydroseek

13

public•Tagging Application Demo •Admin Interface Demo

private•Tagging Application •Admin Interface •User Management Console

other•Registration •Help & Credits

/wEPDwU/wEPDwU

Page 14: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Linking Concept Systems to Data

14

Page 15: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Little Demo

A little demo to show that what we talked about can be done with XMDR.

Use latitude, longitudes for 3-4 sites and what they measure. A small ontology with 5-6 concepts and two ontologies for data sources

(let’s say USGS and EPA) with 5-6  variables (variable = name of what is being measured i.e. parameter name), measurement method metadata etc.

The idea is showing how these (including mappings between variables) can be stored in XMDR and how can they be discovered.

So it is a matter of getting them into XMDR and putting together some sort of a web interface that gets a keyword and returns a list of sites, relevant measurements etc.

This will be done with samples from US at first and then with content from JRC and WISE.

15

Page 16: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Sample Data

16

SitesSite Code Site Name Latitude Longitude Elevation Vertical Datum State County

NWIS:10038000BEAR RIVER BLW SMITHS FORK, NR COKEVILLE, WY 42.1266021 -110.973243 1872NGVD29 Wyoming Lincoln

NWIS:10075000BEAR RIVER AT SODA SPRINGS, ID 42.61381026 -111.583556 1756.1NGVD29 Idaho Caribou

NWIS:10020100BEAR RIVER ABOVE RESERVOIR, NEAR WOODRUFF, UT 41.4343899 -111.0176863 1968NGVD29 Wyoming Uinta

Observations CatalogSite Code Variable Code BeginDate EndDate Observation countNWIS:10038000 NWIS:00038 1/1/1956 1/1/2008 581NWIS:10038000 NWIS:00038 1/1/1980 1/1/2007 1254NWIS:10075000 NWIS:00038 2/5/1969 4/8/2001 65NWIS:10075000 NWIS:00038 1/2/1956 4/6/1990 47NWIS:10020100 EPA:12356-1 3/3/1920 4/3/1945 16NWIS:10020100 EPA:12356-1 2/5/1969 4/6/1990 9

VariablesVariable Code Variable Name MediumNWIS:00038 Ammonia Nitrogen WaterEPA:12356-1 Ammonia Nitrogen Water

Page 17: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Small Concept System

17

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xm="http://www.xmdr.org/"> <xm:concept rdf:about="http://www.xmdr.org/ammoniaNitrogen" dc:title="Ammonia Nitrogen"> <xm:narrowerThan> <xm:concept rdf:about="http://www.xmdr.org/nitrogen" dc:title="Nitrogen"/> </xm:narrowerThan><xm:hasMedium><xm:medium rdf:about="http://www.xmdr.org/water"/></xm:hasMedium> </xm:concept></rdf:RDF>

Page 18: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Technology Overview

18

Metadata:Provenance, etc.

8

XMDR

XMDR

XMDR

Adapted from a slide from Bora Beran

Page 19: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Modular XMDR Archtitecture

Registry Store

Search & Content Serving (Jena, Lucene)

XMDR metamodel (OWL & xml schema)

standard XMDR filesstandard XMDR files

standard XMDR filesstandard XMDR files

LogicIndex

Content Loading & Transformation

(Lexgrid & custom)

Human User Interface(HTML fromJSP and javascript; Exhibit)

Metadata Sources concept systems,

data elements

USERSWeb Browsers…..Client

Software

Application Program Interface (REST)

Authentication ServiceValidation

(XML Schema)

MappingEngine

Logic Indexer(Jana & Pellet)

Text Indexer(Lucene)

Metamodel specs(UML & Editing)

(Poseidon, Protege)

XMDR data model & exchange format

XML, RDF, OWL

TextIndex

Postgres Database

Third Party Software

Page 20: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Video and Discussion

View Video Discuss

20

Page 21: Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Acknowledgements

John McCarthy, LBNL Kevin Keck, LBNL Bora Beran, Microsoft Research

This material is based upon work supported by the National Science Foundation under Grant No. 0637122, USEPA and USDOD. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, USEPA or USDOD.

21