Semantic Web at BBN Parliament & ASIO SCOUT

22
SEMANTIC WEB AT BBN PARLIAMENT & ASIO SCOUT Dave Kolas March 23, 2011

description

Semantic Web at BBN Parliament & ASIO SCOUT. Dave Kolas March 23, 2011. Semantic Technology at BBN. Research: Contributing to standards and technologies Contributing authors to OWL and SWRL Currently developing new semantic-based reasoning language-SILK - PowerPoint PPT Presentation

Transcript of Semantic Web at BBN Parliament & ASIO SCOUT

Page 1: Semantic Web at BBN Parliament & ASIO SCOUT

SEMANTIC WEB AT BBNPARLIAMENT & ASIO SCOUT

Dave KolasMarch 23, 2011

Page 2: Semantic Web at BBN Parliament & ASIO SCOUT

Semantic Technology at BBN

Research: Contributing to standards and technologies Contributing authors to OWL and SWRL Currently developing new semantic-based reasoning

language-SILK Active in the Geospatial Semantic Web community,

including GeoSPARQL Applications: Addressing real-world, operational

challenges Intelligence Data integration and disambiguation Geospatial Image Applications Analytics with Semantic Web underpinnings

Page 3: Semantic Web at BBN Parliament & ASIO SCOUT

3

Key BBN-Led Semantic Initiatives

SemWebCentral.orgasio.bbn.com

W3C OWLRecommendatio

n

2000 2001 2002 2003 2004 2005 2006 2007 2008

FCG (AFRL/AMC)NOTAMS (AFRL/AMC)

Horus (DARPA/IMO)

Combine/APSTARS

IEII (DARPA)

ICEWS (DARPA)

SID

Medical, CommercialApplications

2009 2010

DAML Integration & Transition (DARPA)

JFP ACTD (JFCOM)

GARCON-F (NGA)

Geospatial SW (NGA)

Integrated Learning (DARPA)

Multi-INT Fusion (LM)

DIESL (DARPA)

SASSI/MMON

CODE/COBRA

BBN HostsISWC 2009

ISSL

PINT

Page 4: Semantic Web at BBN Parliament & ASIO SCOUT

Parliament Parliament

In continuous customer use for ~8 years (Originally DAML-DB)

Triple Store with SPARQL support Implemented as a persistence layer for

Jena/Sesame Includes spatial and temporal

indexing/processing Open source!

http://parliament.semwebcentral.org/

Page 5: Semantic Web at BBN Parliament & ASIO SCOUT

5

Design

Joseki Spatial Index Processor

Parliament Graph

Model

IndexingGraph

Spatial Index(PostGIS)

Parliament

Part of JenaParliament Framework

External Storage

Temporal Index Processor

Temporal Index(BDB)

Page 6: Semantic Web at BBN Parliament & ASIO SCOUT

Parliament’s Index Structure Applications often require efficient

statement insertion Goal: Balanced insertion, query

performance, and space required Parliament stores triples using two

components: Resource dictionary Statement table

Page 7: Semantic Web at BBN Parliament & ASIO SCOUT

Parliament Statement TableEach entry (statement) contains: Three resource ID fields: Subject,

predicate, and object of the statement Three statement ID fields: Next

statements using the same resource as subject, predicate, and object

Bit-field flags encoding statement attributes

Page 8: Semantic Web at BBN Parliament & ASIO SCOUT

Parliament Resource Dictionary

Each entry (resource) contains: Bidirectional string-to-ID mapping Three statement ID fields: First

statements using this resource as subject, predicate, and object

Three count fields: Numbers of statements using this resource as subject, predicate, and object

Bit-field flags encoding resource attributes

Page 9: Semantic Web at BBN Parliament & ASIO SCOUT

Parliament Index Example

FS FP FO SC PC OC1 3 - - 3 0 02 - 4 - 0 2 03 5 - 1 2 0 14 - 5 - 0 3 05 - - 2 0 0 16 - - 5 0 0 27 - - 4 0 0 1

Resource TableS P O

1 1 2 32 1 4 53 1 4 64 3 2 75 3 4 6

Statement List

Page 10: Semantic Web at BBN Parliament & ASIO SCOUT

List Length Means (Std Deviations)

Data Set

Size Subject Predicate Non-Lit Object

Lit Object

Webscope 83M 3.96 (9.77) 87,900 (722,575) 3.43 (2,170) 4.33 (659)

Falcon 33M 4.22 (13) 983 (31,773) 2.56 (328) 2.31 (217)Swoogle 175M 5.65 (36) 4,464 (188,023) 3.27 (1,793) 3.38 (569)

Watson 60M 5.58 (56) 3,040 (98,288) 2.87 (918) 2.91 (407)SWSE-1 30M 5.25 (15) 25,404 (289,000) 2.46 (1,138) 2.29 (187)SWSE-2 61M 5.37 (15) 83,773 (739,736) 2.89 (1,741) 2.87 (300)DBpedia 110M 15 (39) 300,855 (3,560,666) 3.84 (148) 1.17 (22)

Geonames 70M 10.4 (1.66) 4,096,150 (3,167,048) 2.81 (1,623) 1.67 (15)

SwetoDBLP 15M 5.63 (3.82) 103,009 (325,380) 2.93 (629) 2.36 (168)

Wordnet 2M 4.18 (2.04) 47,387 (100,907) 2.53 (295) 2.39 (271)

Freebase 63M 4.45 (15) 12,329 (316,363) 2.79 (1,286) 1.83 (116)US Census 446M 5.39 (9.18) 265,005 (1,921,537) 5.29 (15,916) 227 (115,616)

Page 11: Semantic Web at BBN Parliament & ASIO SCOUT

Parliament Experiments demonstrate that Parliament

maintains excellent query performance while significantly increasing insertion throughput and decreasing space requirements

Future work will include: Query optimization strategies Analysis of Parliament’s internal rule engine Further optimizations to the storage structure

Page 12: Semantic Web at BBN Parliament & ASIO SCOUT

Part 2 – Asio Scout Motivation – Linking data across multiple

data sources Underlying data is in different formats

(RDBMS, Web Services, RDF) and different vocabularies

Consolidating data does not solve the problem

Different users need to use this data for different purposes, from different perspectives

Use Semantic Web technology to link the data sources together in a flexible, evolvable way

Page 13: Semantic Web at BBN Parliament & ASIO SCOUT

13

Asio Scout

Web Service

WSDLWSDL

Ontology

OWL

Mapping Ontology

OWL

SWRL Rules

RDBMS

Domain Source Ontology

OWL

QueryDecomposition

Query: SPARQL1

2

4 Data Access

3 Generation ofSub Queries

6Query Result Set

5

Data Source Ontology

OWL

Data Source Ontology

OWL

Semantic BridgeDatabase

Semantic BridgeWeb Service

BackwardsRule Chaining

Snoggle

Parliament

Semantic Query Decomposition (SQD)

Semantic BridgeSPARQL Endpoint

Automapper

Page 14: Semantic Web at BBN Parliament & ASIO SCOUT

RDBMS One

Web Service

SPARQL Endpoint

RDBMS Two

SPARQLQuery

1

QueryDecomposition2

4DataAccess

6QueryResultSet

Semantic Query Decomposition (SQD)

Semantic BridgeRel. Database

5

BackwardsRule Chaining

3 Generation ofSub Queries

Semantic BridgeRel. Database

Semantic BridgeSPARQL Endpoint

Semantic BridgeWeb Service

Federated Query

Page 15: Semantic Web at BBN Parliament & ASIO SCOUT

Rule Expansion When the query is received, the system

expands the query with the mapping rules provided Triples in the query are tagged with the

rules that can produce them, and then are expanded into the body of the rule with variable unification

This process is iterative until the query cannot be expanded any further

Page 16: Semantic Web at BBN Parliament & ASIO SCOUT

Ontology Reasoning Subclass/Subproperty reasoning

This creates more possibilities for inferring query statements (in the way that you would expect)

Disjoint Classes Liberal use of disjointness statements in the ontology

help to reduce generated UNIONS in certain domain ontology situations

Pairwise disjointness can be asserted automatically for some data source ontologies

Functional / Inverse Functional Properties Many unbound variables introduced in the rule

expansion stage are unified

Page 17: Semantic Web at BBN Parliament & ASIO SCOUT

Independent Domain Ontology

Because domain ontology is defined unlinked to data sources, it can remain unbound to the design decisions incorporated in them

As data sources are added to or subtracted from the system, the domain ontology can remain constant

This is a key difference between Scout and other approaches

Page 18: Semantic Web at BBN Parliament & ASIO SCOUT

Practical Concerns New entities often have to be minted for the

domain ontology An additional SWRL builtin provides skolems This results in extra processing in the query

expansion stage In an RDBMS, negation is often meaningful

SPARQL can support querying for negation using the BOUND filter operator and OPTIONAL query blocks

Restricting this concept to leaf data source atoms allows query rewriting to remain valid

This has been a requirement for deploying this software in real situations

Page 19: Semantic Web at BBN Parliament & ASIO SCOUT

Performance Preprocessing of a query takes

milliseconds “Streaming” results means that you start

getting query answers back quickly, even if there are many results The graphs within the SPARQL algebra are

split by data source individually This involves batching database queries

The processing has very little overhead over just executing the queries

Page 20: Semantic Web at BBN Parliament & ASIO SCOUT

SHARDSHARD is released open-source. BSD license. Look at:

My webpage (Search for “SHARD krohloff”) Sourceforge (SHARD-3store)

Use svn to get code:svn co https://shard-3store.svn.sourceforge.net/svnroot/shard-3store shard-

3store Don’t worry - this command is on SourceForge!

Happy to talk offline cloud computing and SHARD Use of SHARD, open-source projects, etc…

20

Page 21: Semantic Web at BBN Parliament & ASIO SCOUT

SHARD Design Overview

Cloud-based triple-store on HDFS. Method calls at client. Processing in cloud via MapReduce jobs. Move results to local machine.

Massively scalable. Commodity hardware.

SPARQL queries. Optimize for complex queries with large response sets.

Basic inferencing.

21

Page 22: Semantic Web at BBN Parliament & ASIO SCOUT

A Map-Reduce Implementation

Open implementation of Google’s tech. Developed from Google publications. VERY large-scale!http://hadoop.apache.org/

Cloudera has great training material. Look for VMWare training virtual machine.http://www.cloudera.com/

Baked-in robustness makes it practical…

22