Dave Lush, Senior SME Aha! Analytics

25
This Briefing is: UNCLASSIFIED Aha! Analytics 2278 Baldwin Drive Phone: (937) 477-2983, FAX: (866) 450-3812 1 Dave Lush, Senior SME Aha! Analytics Semantic Integration Layer for The As-Is Enterprise Data Warehouse

description

Semantic Integration Layer for The As-Is Enterprise Data Warehouse. Dave Lush, Senior SME Aha! Analytics. Purpose(s). Communicate Some Observations About the General Data Integration Problem Cite and Discuss the Semantic Technologies - PowerPoint PPT Presentation

Transcript of Dave Lush, Senior SME Aha! Analytics

Page 1: Dave Lush, Senior SME Aha! Analytics

This Briefing is:UNCLASSIFIED

Aha! Analytics2278 Baldwin Drive

Phone: (937) 477-2983, FAX: (866) 450-3812

1

Dave Lush, Senior SME

Aha! Analytics

Semantic Integration Layerfor

The As-Is Enterprise Data Warehouse

Page 2: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 2

Purpose(s)

Communicate Some Observations About the General Data Integration Problem

Cite and Discuss the Semantic Technologies

Propose a Semantic Data Integration Layer for the General Data Warehouse Architecture

Discuss a Lexis Nexus SSI Data Analytics Supercomputer (DAS) Based Solution

Present Initial Thoughts on the Plan

Page 3: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 3

Topics

Purpose

Background

Givens/Problems/Tasks

Approaches to Data/Info Integration

Semantic Technologies

General Solution Architecture

LNSSI DAS Based Solution Architecture

Thoughts On the Plan

Page 4: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 4

BackgroundData Integration Problems

Application and Enterprise Model Based Approaches

Data Integration Problems Persist

Not Adequately Leveraging Available Metadata

Need for Improved Discovery and Semantic Integration

Emergence of Semantic Technologies

Emergence of LNSSI DAS Capability

Page 5: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 5

The Primary Givens/Problem/Task Givens:

A Collection of Disparate Legacy Databases Perhaps Already Migrated to an Enterprise Data Warehouse Each with Own Independently Developed Logical Data Model and Query Interface

The Requirement To Pose Single Unified Queries Across The Collection Of Legacy Databases And Achieve Semantically Consistent (Coherent) Results

The Problem:

Difficulties in Achieving Useful Results Because of Unresolved Semantic Disconnects in the Disparate Logical Models

Note: The Problem Is Not Primarily One of Discovery of Relevant Already Existing Product Objects But Rather One of Discovering and Semantically Integrating Requisite Product Content From Multiple Sources

The Task at Hand:

Define, Design, and Implement a Capability for the Semantic Integration and Unified Query of the Collection of Disparate Legacy Databases to Achieve Semantically Coherent Results

Page 6: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 6

Basic Data/Info Integration Approaches Application Centric Approach

Do It All in the Application Layer Via Ad Hoc Hand Coding

This Is Very Expensive And Difficult!

Enterprise Information Model and Data Warehouse Approach

Do It Via EDW ETL Methods/Tools in Context of Strict Conformance with Overarching Enterprise Info Model

This Is Also Very Expensive And Difficult And Requires Great Discipline!

Enterprise Information Integration (EII) Approach

Establish Common Single View of Disparate Legacy Sources

Process/Parse Common Domain-wide Queries into Individual Legacy Source Queries and Execute Source Queries

Integrate Source Query Results into Unified Response to the Domain-wide Query

Page 7: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 7

The Basic Data Integration Challenge

Data Interface

LegacyDatabases

Application

Data Interface

Data Interface

The application must process the unified query, formulate and submit associated queries against the disparate databases, and properly integrate the results into a unified response.

This requires that the application handle disparate data interfaces. and that the application contain the necessary semantics regarding the problem domain and the relationships/mappings between problem domain and legacy data models, and the code that accomplishes the mappings.

Logical models for thesedatabases were generallydeveloped independently of each other.

Page 8: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 8

The Enterprise Data Warehouse Approach

Enterprise Data Warehouse

Application

Data Warehouse Services Layer

The legacy databases are migrated to a data warehouse in the context of an overarching enterprise data model so that the logical data models for the individual databases are semantically consistent with the overall model.

The application still must process the unified query, formulate and submit associated queries against the disparate warehouse databases, and properly integrate the results into a unified response.

But this process in theory shouldn’t haveserious semantic inconsistency problem because the individual logical databases in the warehouse are supposed to have logical models which are consistent with an over arching enterprise information model.

target data

Logical models for thesedatabases are consistent with overarching domain model

Extract Transform Load (ETL) Services Common

EnterpriseModel &

Meta-data

Meta-Data MgtTool

source data

Page 9: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 9

Problems Ensue

The Imperative to Abide by a Standard Global Data Model Does Not Prevail

Stove Piped DBs Abound

Semantics of the Stove Pipes Are Inconsistent

Federated Queries Yield Semantically Inconsistent Results

Cannot Replace/Re-engineer Legacy DBs Housed in the EDW

Cannot Replace the EDW Platform (e.g.Teradata) In Use Today

Page 10: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 10

New Imperative

Must Have Some Effective Way to Semantically Integrate the Information Acquired from the Multiplicity of Databases

Page 11: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 11

Semantic Integration

Use Semantic Technologies in Context of the EII Approach (cited previously)

Unified Ontology of Current Situation/View Is Developed and Expressed in OWL or Appropriate Successor Language

Semantic Relationships Between Legacy Data and Rules for Transformation From Legacy to Current View Are Specified and Captured Via OWL or Appropriate Successor Language

Queries in Terms of Current Unified View Are Parsed and Transformed Into Queries of Legacy Sources by a Semantic Query Engine.

Individual Legacy Source Queries Are Executed.

Results Are Transformed and Processed Into a Unified Response by the Semantic Mash-up Engine.

Page 12: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 12

Semantic Technologies Rapidly Maturing with Very Noteworthy Applications

Enhanced Knowledge Discovery

Data/Knowledge Integration Foundational Semantic Technology Constructs

Ontology: Machine Readable Specification of the Essence of a Given Domain Machine Readable Knowledge/Facts Machine Readable Rules Standard Language(s) for Expressing the Above

XML, RDF, RDFS, OWL, RuleML RDF Triple Store Capabilities for Storing the Above Standard Query Languages for Searching the Above

SPARQL Open Source Semantic Application Frameworks

Commercial Capabilities

Oracle Semantic Technologies http://www.oracle.com/technology/tech/semantic_technologies/index.html

TopQuadrant Metatomix Ontoprise

Page 13: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 13

The General Solution Architecture

Semantic Layer Between Apps and Data

Unifying Domain Ontology

Linkage Ontology

OWL/RDF Data Management

Semantic Query Engine

Semantic Mash-up

Semantic Tech Architecture and Building Blocks RDF(S), OWL, RuleML Jena Oracle Semantic Technologies Semantic Application Development Environment (e.g. Top Quadrant)

Page 14: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 14

The Semantic Approach

Enterprise Data Warehouse

Application

Data Warehouse Services Layer

The legacy databases have been migrated to the data warehouse independently each with theirown logical model.

The overall domain has a robust domain ontology. There are linking ontology and ruleswhich relate & map the domainontology to the underlying logical models.

The application captures and submits the unified query to the semantic engine.

The semantic engine processes the query to a standard semantic form and then applies the ontologies andrules to formulate the requisite queries against the individual databases. The individual queries are submitted and the individual responses are received by the semantic mash-up service which uses the available semantic data including the query to create an integrated semantically consistent result for the originalquery.

Domain Ontologies & Rules

Ontology & RuleAuthoring Tools

Semantic Layer

Semantic Query Engine

Semantic QueryResults Mash-Up

Domain, Legacy, & Derived Facts

Linking Ontologies & Rules

OWL/RDF DBMS

Page 15: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 15

Typical EDW Architecture

Does Not Have a Data Integration Layer

This Is a Problem If Total Discipline in Conforming to Enterprise Model Is Not Exercised

And It Has Not Been Exercised

Legacy Databases Independently Migrated

Accomplishing Data Integration in the Application Layer Is Difficult and Expensive

Page 16: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 16

Data Access Tier (ODBC/JDBC)

EDW Data Tier

Pre-Generated CacheWeb Reports & Charts

(Output = HTML, XML, PDF, XLS, DOC, other)

Presentation Transformation

Tier

Users Power Users

AF Portal / AF COP(Presentation Containers = RIA, iFrame, HTML, WSRP

Portlets, other)

PresentationTier

Business Intelligence

ToolsCognosBOBJOther

(Siebel, MS)

Application Tier

AJAX

Current EDW Architecture

Figure 4: Layered EDW Architecture

Key Observations

The as-Is architecture does not include an explicit knowledge mgt or semantics layer.

This is a problem because the effectiveness of the as-is EDW depends on its ability to resolve semantic disconnects between the databases tthat must be queried.

Building these capabilities into the application layer code is very difficult and costly and not responsive to highly dynamic situations.

Page 17: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 17

EDW Architecture with Semantic Layer

Data Access Tier (ODBC/JDBC)

EDW Data Tier

Pre-Generated Cache

Web Reports & Charts(Output = HTML, XML, PDF, XLS,

DOC, other)

Presentation Transformation

Tier

Users

Power Users

AF Portal / AF COP(Presentation Containers = RIA, iFrame, HTML, WSRP

Portlets, other)

PresentationTier

Business Intelligence

ToolsCognosBOBJOther

(Siebel, MS)

Application Tier

AJAX

Semantic Tier

SemanticTools

Figure 5: EDW Architecture with Semantic Layer

Domain Ontologies & Rules

Ontology & RulesAuthoring Tools

Semantic Layer

Semantic Query Engine

Semantic QueryResults Mash-Up

Domain, Legacy, & Derived Facts

Linking Ontologies & Rules

OWL/RDF DBMS

Key Observations

The to-be architecture includes a semantics layer which mediates between the domain query and the data layer .

The semantic layer includes the domain ontology and linkage ontologies which drive the processing of domain queries and the semantic mashup of individual query results coming from the source databases.

Page 18: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 18

A Major Obstacle: Computational Complexity

Many Operations of the Semantic Layer Are Computationally Intensive

Complex Queries Across Multiple Large Data Sources Are Computationally Intensive

Some Kind of Specialized Solution to Execution of the Semantic Operations and Multiple Source Queries Is Required

Page 19: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 19

Hypothesis

The LNSSI DAS Capability Can Be Brought to Bear on Execution of the Semantic Operations and Of Course the Source Queries As Well with Significant Benefits

So the Big Question Is:

Can the LNSSI DAS Be Applied to Large Ontologies, Rule Sets, and Large Data to Provide a Very High Performance Semantic Query Engine?

Page 20: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 20

The LNSSI DAS Based Solution Architecture

Semantic Layer

Federated Domain Ontologies

Transformation Rules

Semantic Engines

LNSSI DAS Used in Two Contexts:

Semantic Ops in the Semantic Layer

Source Queries at the Data Services Level

Page 21: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 21

Enterprise Data Warehouse

QueryApplication

LNSSI DAS Services Layer

The legacy databases have been migrated to the data warehouse independently each with theirown logical model.

The overall domain has a robust domain ontology. There are linking ontology and ruleswhich relate & map the domainontology to the underlying logical models.

The application captures and submits the unified query to the semantic engine.

The semantic engine processes the query to a standard semantic form and then applies the ontologies andrules to formulate the requisite queries against the individual databases. The individual queries are submitted and the individual responses are received by the semantic mash-up service which uses the available semantic data including the query to create an integrated semantically consistent result for the originalquery.

Domain Ontologies & Rules

Ontology & RuleAuthoring Tools

Semantic Layer Services

Semantic Update/Query

Engine

Semantic QueryResults Mash-Up

Domain, Legacy, & Derived Facts

Linking Ontologies & Rules

LNSSI DAS Based OWL/RDF Query

The LNSSI DAS Based Solution Architecture

RDMS Based OWL/RDF DBMS

Page 22: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 22

What To Do?

Lets Execute a Prototype Project To Test the Hypothesis That the LNSSI DAS Can Be Successfully Brought to

Bear On Large Data Integration Problems Requiring Semantic Integration

Page 23: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 23

General Approach Find a Sponsor with Requisite $

Form Appropriate Team and Agreements

Initiate a Prototype Project

Apply Semantic Technologies

Ontology, RFD(S), OWL

RDF Triple Store, SPARQL

Inference Engine(s)

JENA

Leverage LNSSI DAS Architecture and Capability

Create an LNSSI Semantic Integration Platform

Find a Benchmark Problem for Which There Is Already Data, Associated Semantics, and Existing Query Performance Data

Page 24: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 24

Major Activities Project Initiation Initial Analysis, Technology Research, and Knowledge Engineering CONOPS and System Requirements Development/Specification Detailed Program/Project Management Knowledge Engineering

Domain Ontology Acquisition/Development Ontology Legacy Data Relationship, Mapping, and Rule Acquisition/Development Acquisition/Development of the Underlying Data

Architecture Development and System Design Detailed Apps Requirements/Design

Inclusion of RDF/OWL Data Mgt Inclusion of Semantic Query Engine Development of Semantic Mashup Capabilities

Implementation Planning Implementation Test and Eval

Page 25: Dave Lush, Senior SME Aha! Analytics

UNCLASSIFIED

UNCLASSIFIED 25