Dave Lush, Senior SME Aha! Analytics
description
Transcript of Dave Lush, Senior SME Aha! Analytics
This Briefing is:UNCLASSIFIED
Aha! Analytics2278 Baldwin Drive
Phone: (937) 477-2983, FAX: (866) 450-3812
1
Dave Lush, Senior SME
Aha! Analytics
Semantic Integration Layerfor
The As-Is Enterprise Data Warehouse
UNCLASSIFIED
UNCLASSIFIED 2
Purpose(s)
Communicate Some Observations About the General Data Integration Problem
Cite and Discuss the Semantic Technologies
Propose a Semantic Data Integration Layer for the General Data Warehouse Architecture
Discuss a Lexis Nexus SSI Data Analytics Supercomputer (DAS) Based Solution
Present Initial Thoughts on the Plan
UNCLASSIFIED
UNCLASSIFIED 3
Topics
Purpose
Background
Givens/Problems/Tasks
Approaches to Data/Info Integration
Semantic Technologies
General Solution Architecture
LNSSI DAS Based Solution Architecture
Thoughts On the Plan
UNCLASSIFIED
UNCLASSIFIED 4
BackgroundData Integration Problems
Application and Enterprise Model Based Approaches
Data Integration Problems Persist
Not Adequately Leveraging Available Metadata
Need for Improved Discovery and Semantic Integration
Emergence of Semantic Technologies
Emergence of LNSSI DAS Capability
UNCLASSIFIED
UNCLASSIFIED 5
The Primary Givens/Problem/Task Givens:
A Collection of Disparate Legacy Databases Perhaps Already Migrated to an Enterprise Data Warehouse Each with Own Independently Developed Logical Data Model and Query Interface
The Requirement To Pose Single Unified Queries Across The Collection Of Legacy Databases And Achieve Semantically Consistent (Coherent) Results
The Problem:
Difficulties in Achieving Useful Results Because of Unresolved Semantic Disconnects in the Disparate Logical Models
Note: The Problem Is Not Primarily One of Discovery of Relevant Already Existing Product Objects But Rather One of Discovering and Semantically Integrating Requisite Product Content From Multiple Sources
The Task at Hand:
Define, Design, and Implement a Capability for the Semantic Integration and Unified Query of the Collection of Disparate Legacy Databases to Achieve Semantically Coherent Results
UNCLASSIFIED
UNCLASSIFIED 6
Basic Data/Info Integration Approaches Application Centric Approach
Do It All in the Application Layer Via Ad Hoc Hand Coding
This Is Very Expensive And Difficult!
Enterprise Information Model and Data Warehouse Approach
Do It Via EDW ETL Methods/Tools in Context of Strict Conformance with Overarching Enterprise Info Model
This Is Also Very Expensive And Difficult And Requires Great Discipline!
Enterprise Information Integration (EII) Approach
Establish Common Single View of Disparate Legacy Sources
Process/Parse Common Domain-wide Queries into Individual Legacy Source Queries and Execute Source Queries
Integrate Source Query Results into Unified Response to the Domain-wide Query
UNCLASSIFIED
UNCLASSIFIED 7
The Basic Data Integration Challenge
Data Interface
LegacyDatabases
Application
Data Interface
Data Interface
The application must process the unified query, formulate and submit associated queries against the disparate databases, and properly integrate the results into a unified response.
This requires that the application handle disparate data interfaces. and that the application contain the necessary semantics regarding the problem domain and the relationships/mappings between problem domain and legacy data models, and the code that accomplishes the mappings.
Logical models for thesedatabases were generallydeveloped independently of each other.
UNCLASSIFIED
UNCLASSIFIED 8
The Enterprise Data Warehouse Approach
Enterprise Data Warehouse
Application
Data Warehouse Services Layer
The legacy databases are migrated to a data warehouse in the context of an overarching enterprise data model so that the logical data models for the individual databases are semantically consistent with the overall model.
The application still must process the unified query, formulate and submit associated queries against the disparate warehouse databases, and properly integrate the results into a unified response.
But this process in theory shouldn’t haveserious semantic inconsistency problem because the individual logical databases in the warehouse are supposed to have logical models which are consistent with an over arching enterprise information model.
target data
Logical models for thesedatabases are consistent with overarching domain model
Extract Transform Load (ETL) Services Common
EnterpriseModel &
Meta-data
Meta-Data MgtTool
source data
UNCLASSIFIED
UNCLASSIFIED 9
Problems Ensue
The Imperative to Abide by a Standard Global Data Model Does Not Prevail
Stove Piped DBs Abound
Semantics of the Stove Pipes Are Inconsistent
Federated Queries Yield Semantically Inconsistent Results
Cannot Replace/Re-engineer Legacy DBs Housed in the EDW
Cannot Replace the EDW Platform (e.g.Teradata) In Use Today
UNCLASSIFIED
UNCLASSIFIED 10
New Imperative
Must Have Some Effective Way to Semantically Integrate the Information Acquired from the Multiplicity of Databases
UNCLASSIFIED
UNCLASSIFIED 11
Semantic Integration
Use Semantic Technologies in Context of the EII Approach (cited previously)
Unified Ontology of Current Situation/View Is Developed and Expressed in OWL or Appropriate Successor Language
Semantic Relationships Between Legacy Data and Rules for Transformation From Legacy to Current View Are Specified and Captured Via OWL or Appropriate Successor Language
Queries in Terms of Current Unified View Are Parsed and Transformed Into Queries of Legacy Sources by a Semantic Query Engine.
Individual Legacy Source Queries Are Executed.
Results Are Transformed and Processed Into a Unified Response by the Semantic Mash-up Engine.
UNCLASSIFIED
UNCLASSIFIED 12
Semantic Technologies Rapidly Maturing with Very Noteworthy Applications
Enhanced Knowledge Discovery
Data/Knowledge Integration Foundational Semantic Technology Constructs
Ontology: Machine Readable Specification of the Essence of a Given Domain Machine Readable Knowledge/Facts Machine Readable Rules Standard Language(s) for Expressing the Above
XML, RDF, RDFS, OWL, RuleML RDF Triple Store Capabilities for Storing the Above Standard Query Languages for Searching the Above
SPARQL Open Source Semantic Application Frameworks
Commercial Capabilities
Oracle Semantic Technologies http://www.oracle.com/technology/tech/semantic_technologies/index.html
TopQuadrant Metatomix Ontoprise
UNCLASSIFIED
UNCLASSIFIED 13
The General Solution Architecture
Semantic Layer Between Apps and Data
Unifying Domain Ontology
Linkage Ontology
OWL/RDF Data Management
Semantic Query Engine
Semantic Mash-up
Semantic Tech Architecture and Building Blocks RDF(S), OWL, RuleML Jena Oracle Semantic Technologies Semantic Application Development Environment (e.g. Top Quadrant)
UNCLASSIFIED
UNCLASSIFIED 14
The Semantic Approach
Enterprise Data Warehouse
Application
Data Warehouse Services Layer
The legacy databases have been migrated to the data warehouse independently each with theirown logical model.
The overall domain has a robust domain ontology. There are linking ontology and ruleswhich relate & map the domainontology to the underlying logical models.
The application captures and submits the unified query to the semantic engine.
The semantic engine processes the query to a standard semantic form and then applies the ontologies andrules to formulate the requisite queries against the individual databases. The individual queries are submitted and the individual responses are received by the semantic mash-up service which uses the available semantic data including the query to create an integrated semantically consistent result for the originalquery.
Domain Ontologies & Rules
Ontology & RuleAuthoring Tools
Semantic Layer
Semantic Query Engine
Semantic QueryResults Mash-Up
Domain, Legacy, & Derived Facts
Linking Ontologies & Rules
OWL/RDF DBMS
UNCLASSIFIED
UNCLASSIFIED 15
Typical EDW Architecture
Does Not Have a Data Integration Layer
This Is a Problem If Total Discipline in Conforming to Enterprise Model Is Not Exercised
And It Has Not Been Exercised
Legacy Databases Independently Migrated
Accomplishing Data Integration in the Application Layer Is Difficult and Expensive
UNCLASSIFIED
UNCLASSIFIED 16
Data Access Tier (ODBC/JDBC)
EDW Data Tier
Pre-Generated CacheWeb Reports & Charts
(Output = HTML, XML, PDF, XLS, DOC, other)
Presentation Transformation
Tier
Users Power Users
AF Portal / AF COP(Presentation Containers = RIA, iFrame, HTML, WSRP
Portlets, other)
PresentationTier
Business Intelligence
ToolsCognosBOBJOther
(Siebel, MS)
Application Tier
AJAX
Current EDW Architecture
Figure 4: Layered EDW Architecture
Key Observations
The as-Is architecture does not include an explicit knowledge mgt or semantics layer.
This is a problem because the effectiveness of the as-is EDW depends on its ability to resolve semantic disconnects between the databases tthat must be queried.
Building these capabilities into the application layer code is very difficult and costly and not responsive to highly dynamic situations.
UNCLASSIFIED
UNCLASSIFIED 17
EDW Architecture with Semantic Layer
Data Access Tier (ODBC/JDBC)
EDW Data Tier
Pre-Generated Cache
Web Reports & Charts(Output = HTML, XML, PDF, XLS,
DOC, other)
Presentation Transformation
Tier
Users
Power Users
AF Portal / AF COP(Presentation Containers = RIA, iFrame, HTML, WSRP
Portlets, other)
PresentationTier
Business Intelligence
ToolsCognosBOBJOther
(Siebel, MS)
Application Tier
AJAX
Semantic Tier
SemanticTools
Figure 5: EDW Architecture with Semantic Layer
Domain Ontologies & Rules
Ontology & RulesAuthoring Tools
Semantic Layer
Semantic Query Engine
Semantic QueryResults Mash-Up
Domain, Legacy, & Derived Facts
Linking Ontologies & Rules
OWL/RDF DBMS
Key Observations
The to-be architecture includes a semantics layer which mediates between the domain query and the data layer .
The semantic layer includes the domain ontology and linkage ontologies which drive the processing of domain queries and the semantic mashup of individual query results coming from the source databases.
UNCLASSIFIED
UNCLASSIFIED 18
A Major Obstacle: Computational Complexity
Many Operations of the Semantic Layer Are Computationally Intensive
Complex Queries Across Multiple Large Data Sources Are Computationally Intensive
Some Kind of Specialized Solution to Execution of the Semantic Operations and Multiple Source Queries Is Required
UNCLASSIFIED
UNCLASSIFIED 19
Hypothesis
The LNSSI DAS Capability Can Be Brought to Bear on Execution of the Semantic Operations and Of Course the Source Queries As Well with Significant Benefits
So the Big Question Is:
Can the LNSSI DAS Be Applied to Large Ontologies, Rule Sets, and Large Data to Provide a Very High Performance Semantic Query Engine?
UNCLASSIFIED
UNCLASSIFIED 20
The LNSSI DAS Based Solution Architecture
Semantic Layer
Federated Domain Ontologies
Transformation Rules
Semantic Engines
LNSSI DAS Used in Two Contexts:
Semantic Ops in the Semantic Layer
Source Queries at the Data Services Level
UNCLASSIFIED
UNCLASSIFIED 21
Enterprise Data Warehouse
QueryApplication
LNSSI DAS Services Layer
The legacy databases have been migrated to the data warehouse independently each with theirown logical model.
The overall domain has a robust domain ontology. There are linking ontology and ruleswhich relate & map the domainontology to the underlying logical models.
The application captures and submits the unified query to the semantic engine.
The semantic engine processes the query to a standard semantic form and then applies the ontologies andrules to formulate the requisite queries against the individual databases. The individual queries are submitted and the individual responses are received by the semantic mash-up service which uses the available semantic data including the query to create an integrated semantically consistent result for the originalquery.
Domain Ontologies & Rules
Ontology & RuleAuthoring Tools
Semantic Layer Services
Semantic Update/Query
Engine
Semantic QueryResults Mash-Up
Domain, Legacy, & Derived Facts
Linking Ontologies & Rules
LNSSI DAS Based OWL/RDF Query
The LNSSI DAS Based Solution Architecture
RDMS Based OWL/RDF DBMS
UNCLASSIFIED
UNCLASSIFIED 22
What To Do?
Lets Execute a Prototype Project To Test the Hypothesis That the LNSSI DAS Can Be Successfully Brought to
Bear On Large Data Integration Problems Requiring Semantic Integration
UNCLASSIFIED
UNCLASSIFIED 23
General Approach Find a Sponsor with Requisite $
Form Appropriate Team and Agreements
Initiate a Prototype Project
Apply Semantic Technologies
Ontology, RFD(S), OWL
RDF Triple Store, SPARQL
Inference Engine(s)
JENA
Leverage LNSSI DAS Architecture and Capability
Create an LNSSI Semantic Integration Platform
Find a Benchmark Problem for Which There Is Already Data, Associated Semantics, and Existing Query Performance Data
UNCLASSIFIED
UNCLASSIFIED 24
Major Activities Project Initiation Initial Analysis, Technology Research, and Knowledge Engineering CONOPS and System Requirements Development/Specification Detailed Program/Project Management Knowledge Engineering
Domain Ontology Acquisition/Development Ontology Legacy Data Relationship, Mapping, and Rule Acquisition/Development Acquisition/Development of the Underlying Data
Architecture Development and System Design Detailed Apps Requirements/Design
Inclusion of RDF/OWL Data Mgt Inclusion of Semantic Query Engine Development of Semantic Mashup Capabilities
Implementation Planning Implementation Test and Eval
UNCLASSIFIED
UNCLASSIFIED 25