Michael Lang Sr. Presentation

36
Semantic Software Architecture Using Semantic Technology to Build the Enterprise Information Web Michael Lang [email protected]

Transcript of Michael Lang Sr. Presentation

Page 1: Michael Lang Sr. Presentation

Semantic Software Architecture

Using Semantic Technology to Build the Enterprise Information Web

Michael [email protected]

Page 2: Michael Lang Sr. Presentation

2

Emergent Analytics

Extensible enterprise information management paradigm

Add semantics to all aspects of the enterprise's information systems

All information becomes easily accessible using SPARQL

Add new information easily Understand how everything is related and what it is

Provides the capability to analyze information enterprise wide

Page 3: Michael Lang Sr. Presentation

3

IT

Page 4: Michael Lang Sr. Presentation

4

Information Technology

The technology that enables the management of all types of information

Create it – works great Store it – works great Change it – works great Find it – not so good Analyze it – very complex, very difficult Use it – works great if you are inside the application

that creates it, otherwise BIG problem Commonly called SILOS

We all want FEDERATION

Page 5: Michael Lang Sr. Presentation

5

The term “Semantic Web” will not appear in this presentation

Page 6: Michael Lang Sr. Presentation

6

Semantic

Technology

IT

Page 7: Michael Lang Sr. Presentation

7

New Information Management Paradigm

Semantic Technology is a layer of description that sits within the current IT infrastructure

We build the descriptions using OWL and RDF We access the descriptions at run-time using

SPARQL OWL and RDF are unique because they are a

description language and an information model that has its own unique aspects

Enables a radical transformation of IT capabilities Completely distributed information management FEDERATION

Page 8: Michael Lang Sr. Presentation

8

Information Federation Enterprises are made up of many domains within domains

Sales, Operations, R&D, Executive management, manufacturing, … Logistics, HR, Finance, intelligence …

Each domain fields its own applications and creates its own information to execute its mission

It is normally not possible to federate and integrate applications within domains, across domains or with partners

Enterprises will not take the next step in analytic capability until they first solve the INFORMATION federation problem

Page 9: Michael Lang Sr. Presentation

9

What are RDF and OWL for?

They are only used for one thing....

To DESCRIBE things

ANYTHING

Machines canUNDERSTANDthe descriptions

Page 10: Michael Lang Sr. Presentation

10

Federation Requires Description

Information discovery, reuse, and integration all depend on description

If we do not know what something is we cannot possibly know how to integrate it with other things or even how it should be used

If we describe everything well enough, we are in a position to have a knowledge-based web

integrate and interoperate Analyze any combination of information

RDF & OWL enable information federation both machines and people can understand the descriptions

Page 11: Michael Lang Sr. Presentation

11

Defense Advanced Research Projects Agency

Relational Database Technology TCP/IP OWL/RDF

DARPA creates the Defense Agent Markup Language program in 2000 to facilitate information federation - DAML.org

W3C takes the work funded by DARPA and others to create the Resource Description Framework (RDF) and Ontology Web Language (OWL) specifications

These standards comprise an excellent information management technology architecture

There are no other standards that can be used to accomplish the goal of information federation

Page 12: Michael Lang Sr. Presentation

12

World Wide Web Architecture

Mature

Active Researchand StandardsActivity

CommercialCuttingEdge

Page 13: Michael Lang Sr. Presentation

13

Semantic Software Architecture

Page 14: Michael Lang Sr. Presentation

14

Semantic Software Architecture

All components support RDF, OWL and/or SPARQL as well as other web technologies

OWL modeling tools RDF stores Spyders Federators SPARQL endpoints Visualization tools Analytic tools SPARQL endpoint registry

Page 15: Michael Lang Sr. Presentation

15

SpyderSoftware component that transforms relational data formats to RDF using the mapping ontology

Adds the semantics of any domain ontology to any database

Provides SPARQL endpoint for relational databases

Generates information about sources to optimize performance

exposes full power of SQL allows mappings themselves to be analyzed Minimizes or eliminates the need for triple stores Easier to use than ETL REUSEABLE

Page 16: Michael Lang Sr. Presentation

16

Federator

Enables users to query multiple RDF graphs exposed by Spyders as if they were a single graph

Uses the source metadata provided by Spyders to optimize performance

Works against the native information sources Does not require RDF to be moved into a triple store before it is

queried Delegates the maximum amount of processing as far down as

possible

Better solution than traditional ETL based processes Uses the domain ontology and mapping ontology

Supports complex analytics Integrated with rules engine

Page 17: Michael Lang Sr. Presentation

Spyder

OptimizerIndexes

PlannerRe-Writer

SPARQL Endpoint

Page 18: Michael Lang Sr. Presentation

Federator

OptimizerCache Indexes

PlannerRe-WriterRules

SPARQL Endpoint

Page 19: Michael Lang Sr. Presentation

Federator

OptimizerCache Indexes

PlannerRe-WriterRules

SPARQL Endpoint

Data Source

Spyder

OptimizerIndexes

PlannerRe-Writer

SPARQL Endpoint

Spyder

OptimizerIndexes

PlannerRe-Writer

SPARQL Endpoint

Mapping Ontology Mapping Ontology

Metadata Ontology Metadata Ontology

Domain OntologySPARQL

Endpoint Registry

Dashboard

SPARQL

SPARQL SPARQL

SQL SQL

Data Source

Page 20: Michael Lang Sr. Presentation

Ontology Repository

Federator

OptimizerCache Indexes

PlannerRe-WriterRules

SPARQL Endpoint

Data Source

Spyder

OptimizerIndexes

PlannerRe-Writer

SPARQL Endpoint

Spyder

OptimizerIndexes

PlannerRe-Writer

SPARQL Endpoint Mapping Ontology

Metadata Ontology

Domain Ontology

SPARQL Endpoint Registry

Dashboard

SPARQL

SPARQL SPARQL

SQL SQL

Data Source

SPARQL

SPARQL

SPARQL

Page 21: Michael Lang Sr. Presentation

21

Ontology Architecture

Page 22: Michael Lang Sr. Presentation

22

Ontology Architecture

An ontology architecture is the system of ontologies required to accomplish a goal

Very much like a software architecture

The goal for an EIW is federation of information sources across business units to enable enterprise reporting and analysis

The ontology architecture of an EIW is designed to solve the information federation problem

While enabling sophisticated analytics

Page 23: Michael Lang Sr. Presentation

23

EIW Ontology Architecture

Human ResourcesDomain Ontology

Relational MappingOntology

Relational MappingOntology

Process Ontology

RDBMS RDBMS

Standards Ontology

AnalyticsOntology

SourceOntology

SourceOntology

Discussion Ontology

Community Ontology

Top-down

Bottom-up

Page 24: Michael Lang Sr. Presentation

24

EIW Ontology Architecture for Federation

Human ResourcesDomain Ontology

Relational MappingOntology

Relational MappingOntology

RDBMS RDBMS

Reporting/Analytics

SPARQL

SourceOntology

SourceOntology

The Federator

Page 25: Michael Lang Sr. Presentation

25

Domain Ontology

The Domain Ontology is a conceptual description of a business domain

The “domain” is defined by the business processes, rules, information sources, and any required analytics

Instances in this ontology are the same instances which are currently stored in information sources (databases)

Exposes all information of the domain to any user or application using the business terminology of the domain

in some cases, these business terms are defined by standards

Page 26: Michael Lang Sr. Presentation

26

Relational Mapping Ontology Describes how concepts in the domain ontology relate

to data in databases Enables the translation of data from a relational format

to RDF format, using terminology defined in the Domain Ontology

We have created a document that defines the Relational Mapping Ontology

This document should be released to the public this year The D2RQ language was not sufficient for our mission

http://www.knoodl.com/ui/groups/Mapping_Ontology_Community

Page 27: Michael Lang Sr. Presentation

27

Relational Schema Ontology Represents metadata about a relational database

schema as instance data All columns are instances and have properties relating them to

their tables

Enables analysis of the way a database is mapped to the Domain Ontology (via the Relational Mapping Ontology)

How many columns are mapped to properties in the Domain Ontology?

How many are mapped to classes? How is Person represented in customer management system?

Page 28: Michael Lang Sr. Presentation

28

Analytics Ontology Enables us to describe questions, queries, reports, forms

we represent questions as instances and relate them to the queries that provide their answers

Queries are related to Domain Ontology concepts Domain Ontology concepts are mapped to data sources Enables "gap analysis" of analytic requirements

are the concepts used in the query to answer this question mapped to the necessary data sources?

Long-term can be used to model-drive a reporting tool create instances of "Reports" and the tool builds them

Page 29: Michael Lang Sr. Presentation

29

Process Ontology Enables description of business processes

RDF/OWL version of BPMN Enables analysis of the information flows of business

process steps in terms of the HR Domain Ontology Long-term will enable execution of processes described

as instances of the ontology Short-term enables us to link processes with other

artifacts in the domain Domain Ontology concepts Standards documentation Discussions - anything

Page 30: Michael Lang Sr. Presentation

30

How Hard is this? Many people believe that it is too hard, not enough trained

people and takes too long to build the descriptions So millions of dollars and many years have been spent trying to

develop an automated way of doing the modeling Automated machine learning has not been invented The machines must be bootstrapped with descriptions

The first bullet is a fallacy It is not very hard There are plenty of people that can do this work It does not take very long to build the models

Page 31: Michael Lang Sr. Presentation

31

Federation SolutionEnterprise Information Web

Any information from any system can be shared with any other system on the enterprise networks or the World Wide Web

Steps Describe all of the terms and artifacts in each domain using RDF, OWL

We currently do this description work, but we do not use machine readable standards – Excel, Word, Powerpoint, Visio

The formal description of a domain is called a domain ontology Describe how all of the information managed in each domain is related to

the domain vocabularyUse these descriptions to say how domains are related

Query the Domain vocabularies for any informationThe result is an Enterprise Information Web that meets the goals of information sharing and analysis

Page 32: Michael Lang Sr. Presentation

32

Relational DB’s

FinanceHR

Logistics

Web Service

Domain DescriptionsKnowledgebase

Web sites

Applications

1. Information Systems

2. Expose as RDF web services or SPARQL

endpoints

3. EIW contains self described data

4. ESB is a big federated knowledgebase of any

information

user

5. Any authorized user or system can query the ESB for any information

Enterprise Information Web

RDF Web Service

sensors

Web Service

weather

location

Federator Web Service

Enterprise Information Web

Page 33: Michael Lang Sr. Presentation

33

Leverage Existing Investment

We leverage existing infrastructure Same networks, same security, same applications,

same organizations

A lot of this description work is being done now, it simply requires some redirection

Must use standards like any other federation

The result of this relatively minor change and expense is an astounding advance in information management capability

Page 34: Michael Lang Sr. Presentation

34

EIW DemoCommunity Content

SecurityDiscussionsOWL editingASK queries

View Designer/Views

Page 35: Michael Lang Sr. Presentation

35

Visual Ontology Web Language

Page 36: Michael Lang Sr. Presentation

36

VisualizationThere is no adopted standard by W3C for visual representation of OWL or RDF models

OWL and RDF will not become a widely used standards without good visualization of models

We do not believe any existing modeling standard will do, OWL is too different

We need OWL design patterns to fundamentally change information management capability at DOD and elsewhere

The capability will be in beta test in December on knoodl.com