Knowledge Modeling, use of information sources in the study of domains and inter-domain...

40
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker

Transcript of Knowledge Modeling, use of information sources in the study of domains and inter-domain...

Page 1: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships

- A Learning Paradigm

by

Sanjeev Thacker

Page 2: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Introduction

• Introduction– Background– ADEPT– Problems– Contributions

Page 3: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Background

• Web is an ever-increasing source of information

• Information of interest to user is distributed across multiple heterogeneous sources

• Need for integration to provide a one point access for querying

Page 4: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

• Besides querying, use the data sources to extract useful knowledge

• Provide an environment for studying domains

• Provide means to study and explore complex inter-domain relationships

• Ability to pose complex information requests across multiple domains

ADEPT

Page 5: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Problems

• Diverse and distributed sources

• Web sources unlike database– Unstructured or semi-structured– Inconsistencies and information

overlapping

• Heterogeneities – Semantic– Structural– Syntactic

Page 6: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Problems

• Representation of complex relationships

• Use of Knowledge Model for complex information request capability with embedded semantic information

Page 7: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Contribution

• Knowledge Model

• Information Scape Model

• Learning Paradigm

• Visual Interfaces

Page 8: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Outline

• Knowledge Modeling

• Information Scapes

• Learning Paradigm

• Visual Interfaces

• Related Work

• Future Work

• Demo

Page 9: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Knowledge Modeling

• Approach to source modeling– Global model and source model– Source centric / query centric

Page 10: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Source Centric

Advantages– Global model independent of source

model– Modeling a source is independent of

other sources– Dynamic addition, removal and

modification of sources– Global view remains unaffected– No source mapping required during

information integration– More suitable for sources other than

database sources ( web sources)

Page 11: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Knowledge Base

• Comprises of– Ontologies (Domain model)– Resources– Relationships– Operations

Page 12: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Domain Hierarchy

Page 13: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Ontology

• Standardize meaning, description, representation of involved attributes

• Capture the semantics involved via domain characteristics

• Allow knowledge sharing and reuse

• Resolve resource model differences by mapping them to the global model of the ontology they represent

• Global interface

Page 14: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Ontology

• Description includes– Attributes– Domain Rules– Functional Dependencies

Page 15: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Resource

• Desirable characteristics:– Add, modify and delete resources for an

ontology dynamically without affecting the systems knowledge

– Specify the sources in a manner such that one can declaratively query them

– Since the number of resources is large there is a need to identify the exact usefulness of resources from the query viewpoint and prune the others

Page 16: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Resource

• Description includes– Attributes– Binding Patterns– Data Characteristics– Local Completeness

Page 17: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Relationships

• Simple relationships:– equals, less-than, like, is-a, is-part-of

• Are hierarchical or similarity based

• Complex relationships– “Earthquakes cause Tsunami”, “Nuclear

explosions cause earthquakes”, “Air-pollution affects vegetation”

Page 18: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Relationships

• Characteristics– Involves multiple ontologies– Requires understanding the semantics

involved in their interaction– Cannot be expressed by simple

relational and logical operators alone– Involves use of complex operations like

functions and simulations

Page 19: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Relationship

• Example– “Nuclear explosion causes Earthquakes”

• NuclearTest Causes Earthquake:dateDifference(NuclearTest.eventDate,

Earthquake.eventDate)<30 AND distance(NuclearTest.latitude,

NuclearTest.longitude, Earthquake,latitude,

Earthquake.longitude)<10000

Page 20: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Operations

• Functions, Simulations• Functions

– user defined– used to model the semantics

involved in the relationships– used in post processing of result data– example distance, dateDifference

• Simulations– independent programs– used for post processing of result data– example clarke urban growth model

Page 21: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Information Scape(Iscape)

• Representation of an information request across multiple domains

• Can be deployed and executed• Sources not explicitly specified like in

a query• System is aware of the sources and

is able to identify the useful sources• Semantic correlation across domains

is embedded within the information request

Page 22: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Information Scape

• Definition– An IScape may be defined as

information request over distributed heterogeneous sources of information involving multiple ontologies and the relationships between them that contains meta-information constructed to facilitate the bridging of semantic relationships between individual sources.

Page 23: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Information Scape

• Ontologies

• Relationships

• Constraint– Conjunctive boolean expression

• Runtime configurable constraint– Conceptually different

• Grouping and group constraint– Similar to having clause in SQL

• Projection list

Page 24: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Learning Paradigm

• Study of domain

• Use IScapes to study the domain interaction by using relationships

• Relationships could lead to transitive findings

• Explore the hypothetical relationships to validate and establish them or invalidate them

Page 25: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Learning Paradigm

• Data mining– Age and breast cancer

• Relationships– Nuclear Explosion causes Earthquakes

• Post processing– Functions– Simulations– Charting tool

Page 26: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Learning Paradigm

• Find the earliest recorded Nuclear test conducted

• Plot a graph of the average number of Earthquakes of magnitude greater than 5.8 per year starting from 1900

• Find the average number of Earthquakes of magnitude greater than 5.8 between 1900-1949 and between 1950-present

Page 27: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Learning Paradigm

• Find the average number of Earthquakes of magnitude greater than 7 between 1900-1949 and between 1950-present

• Find pairs of Nuclear tests and Earthquakes that occurred with a certain radius and a certain time period of the explosion

Page 28: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Visual Interfaces

• Knowledge Builder

• IScape Builder

• Web Interface

• IScape Processing Monitor

Page 29: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Knowledge Builder

• GUI to build the knowledge base– fast and easy to use– Manually creating the knowledge could

be arduous and error prone

• Knowledge is stored in the standard XML format

• Abstraction from the underlying format and other technical details

Page 30: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Knowledge Builder

• Assists in the creation, deletion and modification of the knowledge base

• Automatically creates a knowledge tree that assists in relating the knowledge in a better manner

Page 31: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Knowledge Builder

Page 32: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Knowledge Hierarchy

Page 33: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

IScape Builder

• GUI to create, deploy and execute IScapes in a step by step manner

• IScape stored in XML format

• User abstraction to the underlying structure

• Validity checks implemented

• Integrated tools – the charting tool to plot charts with the

result data

Page 34: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

IScape Builder

Page 35: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Web Interface

• Web accessible– Knowledge Base– Existing Iscapes

• Set the runtime configurable constraint

• Execute existing IScapes

• View the tabulated results

• Cannot create new IScapes

Page 36: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Web InterfaceResult Screen

Page 37: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

IScape Processing Monitor

• Color coded log entries describing the IScape processing are generated– Brief message along with agent name– Time stamp– detailed description and associated

data, if any– IScape plan for the existing sources– Intermediate results

• High level debugging tool – Understand execution, locate failures

• Not available with the web interface

Page 38: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Monitor GUI

Page 39: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Related Work

• State of the art– SIMS, TSIMMIS, Information Manifold,

Observer, Infosleuth

• Mainly focussed on one point access for querying of integrated data of a domain

• What makes ADEPT unique– Relationships, IScapes, learning

paradigm distinguishes our system from any prior work

Page 40: Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Future Work

• Support rules of type “if-then” and use of induction learning to speed up the processing

• Recursive query capability required• IScape over Iscape support required• Simulations currently supported as

specialized function in our framework• Statistical analysis tools like SAS for

time series analysis, logistic regression