Semantic Web at BBN Parliament & ASIO SCOUT
description
Transcript of Semantic Web at BBN Parliament & ASIO SCOUT
SEMANTIC WEB AT BBNPARLIAMENT & ASIO SCOUT
Dave KolasMarch 23, 2011
Semantic Technology at BBN
Research: Contributing to standards and technologies Contributing authors to OWL and SWRL Currently developing new semantic-based reasoning
language-SILK Active in the Geospatial Semantic Web community,
including GeoSPARQL Applications: Addressing real-world, operational
challenges Intelligence Data integration and disambiguation Geospatial Image Applications Analytics with Semantic Web underpinnings
3
Key BBN-Led Semantic Initiatives
SemWebCentral.orgasio.bbn.com
W3C OWLRecommendatio
n
2000 2001 2002 2003 2004 2005 2006 2007 2008
FCG (AFRL/AMC)NOTAMS (AFRL/AMC)
Horus (DARPA/IMO)
Combine/APSTARS
IEII (DARPA)
ICEWS (DARPA)
SID
Medical, CommercialApplications
2009 2010
DAML Integration & Transition (DARPA)
JFP ACTD (JFCOM)
GARCON-F (NGA)
Geospatial SW (NGA)
Integrated Learning (DARPA)
Multi-INT Fusion (LM)
DIESL (DARPA)
SASSI/MMON
CODE/COBRA
BBN HostsISWC 2009
ISSL
PINT
Parliament Parliament
In continuous customer use for ~8 years (Originally DAML-DB)
Triple Store with SPARQL support Implemented as a persistence layer for
Jena/Sesame Includes spatial and temporal
indexing/processing Open source!
http://parliament.semwebcentral.org/
5
Design
Joseki Spatial Index Processor
Parliament Graph
Model
IndexingGraph
Spatial Index(PostGIS)
Parliament
Part of JenaParliament Framework
External Storage
Temporal Index Processor
Temporal Index(BDB)
Parliament’s Index Structure Applications often require efficient
statement insertion Goal: Balanced insertion, query
performance, and space required Parliament stores triples using two
components: Resource dictionary Statement table
Parliament Statement TableEach entry (statement) contains: Three resource ID fields: Subject,
predicate, and object of the statement Three statement ID fields: Next
statements using the same resource as subject, predicate, and object
Bit-field flags encoding statement attributes
Parliament Resource Dictionary
Each entry (resource) contains: Bidirectional string-to-ID mapping Three statement ID fields: First
statements using this resource as subject, predicate, and object
Three count fields: Numbers of statements using this resource as subject, predicate, and object
Bit-field flags encoding resource attributes
Parliament Index Example
FS FP FO SC PC OC1 3 - - 3 0 02 - 4 - 0 2 03 5 - 1 2 0 14 - 5 - 0 3 05 - - 2 0 0 16 - - 5 0 0 27 - - 4 0 0 1
Resource TableS P O
1 1 2 32 1 4 53 1 4 64 3 2 75 3 4 6
Statement List
List Length Means (Std Deviations)
Data Set
Size Subject Predicate Non-Lit Object
Lit Object
Webscope 83M 3.96 (9.77) 87,900 (722,575) 3.43 (2,170) 4.33 (659)
Falcon 33M 4.22 (13) 983 (31,773) 2.56 (328) 2.31 (217)Swoogle 175M 5.65 (36) 4,464 (188,023) 3.27 (1,793) 3.38 (569)
Watson 60M 5.58 (56) 3,040 (98,288) 2.87 (918) 2.91 (407)SWSE-1 30M 5.25 (15) 25,404 (289,000) 2.46 (1,138) 2.29 (187)SWSE-2 61M 5.37 (15) 83,773 (739,736) 2.89 (1,741) 2.87 (300)DBpedia 110M 15 (39) 300,855 (3,560,666) 3.84 (148) 1.17 (22)
Geonames 70M 10.4 (1.66) 4,096,150 (3,167,048) 2.81 (1,623) 1.67 (15)
SwetoDBLP 15M 5.63 (3.82) 103,009 (325,380) 2.93 (629) 2.36 (168)
Wordnet 2M 4.18 (2.04) 47,387 (100,907) 2.53 (295) 2.39 (271)
Freebase 63M 4.45 (15) 12,329 (316,363) 2.79 (1,286) 1.83 (116)US Census 446M 5.39 (9.18) 265,005 (1,921,537) 5.29 (15,916) 227 (115,616)
Parliament Experiments demonstrate that Parliament
maintains excellent query performance while significantly increasing insertion throughput and decreasing space requirements
Future work will include: Query optimization strategies Analysis of Parliament’s internal rule engine Further optimizations to the storage structure
Part 2 – Asio Scout Motivation – Linking data across multiple
data sources Underlying data is in different formats
(RDBMS, Web Services, RDF) and different vocabularies
Consolidating data does not solve the problem
Different users need to use this data for different purposes, from different perspectives
Use Semantic Web technology to link the data sources together in a flexible, evolvable way
13
Asio Scout
Web Service
WSDLWSDL
Ontology
OWL
Mapping Ontology
OWL
SWRL Rules
RDBMS
Domain Source Ontology
OWL
QueryDecomposition
Query: SPARQL1
2
4 Data Access
3 Generation ofSub Queries
6Query Result Set
5
Data Source Ontology
OWL
Data Source Ontology
OWL
Semantic BridgeDatabase
Semantic BridgeWeb Service
BackwardsRule Chaining
Snoggle
Parliament
Semantic Query Decomposition (SQD)
Semantic BridgeSPARQL Endpoint
Automapper
RDBMS One
Web Service
SPARQL Endpoint
RDBMS Two
SPARQLQuery
1
QueryDecomposition2
4DataAccess
6QueryResultSet
Semantic Query Decomposition (SQD)
Semantic BridgeRel. Database
5
BackwardsRule Chaining
3 Generation ofSub Queries
Semantic BridgeRel. Database
Semantic BridgeSPARQL Endpoint
Semantic BridgeWeb Service
Federated Query
Rule Expansion When the query is received, the system
expands the query with the mapping rules provided Triples in the query are tagged with the
rules that can produce them, and then are expanded into the body of the rule with variable unification
This process is iterative until the query cannot be expanded any further
Ontology Reasoning Subclass/Subproperty reasoning
This creates more possibilities for inferring query statements (in the way that you would expect)
Disjoint Classes Liberal use of disjointness statements in the ontology
help to reduce generated UNIONS in certain domain ontology situations
Pairwise disjointness can be asserted automatically for some data source ontologies
Functional / Inverse Functional Properties Many unbound variables introduced in the rule
expansion stage are unified
Independent Domain Ontology
Because domain ontology is defined unlinked to data sources, it can remain unbound to the design decisions incorporated in them
As data sources are added to or subtracted from the system, the domain ontology can remain constant
This is a key difference between Scout and other approaches
Practical Concerns New entities often have to be minted for the
domain ontology An additional SWRL builtin provides skolems This results in extra processing in the query
expansion stage In an RDBMS, negation is often meaningful
SPARQL can support querying for negation using the BOUND filter operator and OPTIONAL query blocks
Restricting this concept to leaf data source atoms allows query rewriting to remain valid
This has been a requirement for deploying this software in real situations
Performance Preprocessing of a query takes
milliseconds “Streaming” results means that you start
getting query answers back quickly, even if there are many results The graphs within the SPARQL algebra are
split by data source individually This involves batching database queries
The processing has very little overhead over just executing the queries
SHARDSHARD is released open-source. BSD license. Look at:
My webpage (Search for “SHARD krohloff”) Sourceforge (SHARD-3store)
Use svn to get code:svn co https://shard-3store.svn.sourceforge.net/svnroot/shard-3store shard-
3store Don’t worry - this command is on SourceForge!
Happy to talk offline cloud computing and SHARD Use of SHARD, open-source projects, etc…
20
SHARD Design Overview
Cloud-based triple-store on HDFS. Method calls at client. Processing in cloud via MapReduce jobs. Move results to local machine.
Massively scalable. Commodity hardware.
SPARQL queries. Optimize for complex queries with large response sets.
Basic inferencing.
21
A Map-Reduce Implementation
Open implementation of Google’s tech. Developed from Google publications. VERY large-scale!http://hadoop.apache.org/
Cloudera has great training material. Look for VMWare training virtual machine.http://www.cloudera.com/
Baked-in robustness makes it practical…
22