Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry...
-
Upload
kelsi-whites -
Category
Documents
-
view
216 -
download
0
Transcript of Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry...
Data Model vs. OntologyDr. Tatiana Malyuta
Associate Professor, CUNYConsultant for DoD
Dr. Barry SmithUB, NCOR
Data Model - Purpose• To provide a consistent and efficiently functioning data
store for a particular business application(s)– Represents specific business concepts in a way that determines
organization of data in the store– Commonly used representations are relational and graph; they
are supported by data management technologies, e.g. relational – Oracle and MySQL, graph – Neoj4, RDF/OWL stores.
• Efficiency requires – Application-specific representations– Store only data needed the application
• Objective (shared) representation of the domain is not the purpose – multiple data models for the same domain to accommodate different business applications
Data Silos• Numerous partial idiosyncratic representations of the domain
in data models and numerous versions of data in data stores• No re-usability• No single version of truth
Accounts Receivable
Accounts Payable
Budget
Ontology – Purpose • Objectivity of representation of reality • Commonly used representation is graph, it is
supported by RDF-based semantic technologies • Objective (shared) representation of the domain
- one authoritative ontology for the domain of reality meant for re-use
• Storing vast volumes of data is not the purpose
Financial Ontology• A single domain ontology (or a collection of ontologies) • To be re-used in different applications • Single version of truth (as we know it today)
Note: we discuss ontologies built in accordance with the methodology and architecture pioneered by Dr. Smith.
Comparison• Although there are technologies that support a particular
paradigm in the best way, they are not the defining factor in distinguishing between a data model and ontology
• We compare not technologies but paradigms
Skills
Person
Programming Skill
Skill
Computer Skill
Person Name
First Name Network
Skill
Person Name Network Skill Programming Skill
Last Name First Name Skill
Person Name Computer Skill
PersonSkill
Java Skill
Middle Name
Last Name
Nick Name
Ontology Data Model
Data Model – Types• Types are general or repeatable entities capable of being
instantiated by indefinitely many particulars• Data model types and instances are abstractions embodying
efficient ways of describing the data about reality that is needed by an application (efficient both for reasoning and for storage)– Different abstractions depending on the business need
The data model term ‘person’ is used to define an efficient storage solution for data about persons needed by a particular application
Ontology – Types• Ontology types and instances are on the side of reality• They must provide one term, and one definition, for
each salient type of entity in each domain of interest
The ontology term ‘person’, when it is used to represent data about persons, is designed to establish a link between these data and persons in reality.
Data Model – Organization• Arbitrary combination of selected types suited for
efficient data processing• The data model view of reality is flat and rigid
One of the models needs to be changed to accommodate multiple skills of a person. These changes can be performed only through significant effort because of relative rigidity of data representation languages and the need to re-arrange the physical data store
Ontology - Organization• Each type appears only once in the ontology
hierarchy. • The ontology view of reality is synoptic – it
represents in non-redundant fashion an entire hierarchy of types at different levels of generality. Each term is associated in an intelligible way with its subsuming and subsumed terms (and thus with the ancestor and descendant types) in the hierarchy of more and less general
• Representation is more flexible, changes are easier to make, and changes are not as disruptive
Questions?
Data Model vs. Ontology –Types and Individuals
Person Name SkillJohn Computer SkillMary Sewing Skill
Skill
Computer Skill
Programming Skill
Java C++
Person Name SkillJohn JavaMary C++
Data Model – Labels • Are not as important because databases are not
directly exposed to users – they are presented via an application that exposes the database content using the specific vocabulary of a narrow community of users
• Can be anything, e.g. ‘PN’, ‘PName’, ‘PersName’, ‘PersonN’, etc. for the person name
• The meaning of the label is often derived from the context (e.g. Name for the name of the Person and the name of the Skill in one of the examples)
Ontology - Labels
• Are exposed to users• Are nouns and noun phrases from natural
language, and each type has a unique name that designates the type unambiguously regardless of the context in which the type might be used, e.g. PersonName, SkillName
Closed and Open World Assumptions(impact of technologies)
• Database reasoning is confined to search based on the closed world assumption. If we do not find something in the database, then this means that this something does not exist in the world that is defined by the database.
• Ontologies are based on the idea that we can never describe entities in the real world completely. This means that, from the absence in an ontology of a particular term ‘A’, we cannot infer that As do not exist. It means also that ontologies are constructed in a way which allows easy addition of new types and relations.
Life Span
• Data models are created in ad hoc ways to capture targeted selection of features; the data model usually is not reused, which results in numerous data silos for a domain
• Ontologies will grow and expand as new knowledge is gained over time
Summary of ComparisonDimension of Comparison Traditional Data-Model Ontologies
Closeness to reality
Variable, application-specific Reality is always the prime focus
Conceptualization of the domain
Plain and partial (always at the level of detail needed for a particular implementation)
Hierarchical, simultaneously describing the same domain at different levels of detail
Vocabulary Application-specific, not intended for sharing
Application-independent, intended to support sharing and reuse
Structures or organization of types
Groupings of types to accommodate data access patterns
Taxonomies (type hierarchies) always used to describe/classify the domain
Combinability Can rarely be combined; even if possible this will typically require significant manual effort
If the ontology building methodology is followed, then the results will be combinable automatically
Flexibility Rigid, changes normally require significant effort
Flexible, changes can normally be effected very easily.
18
Semantic Enhancement of Data Models by Ontology
• Semantic Enhancement (SE) is realized with the help of ontologies that are used to explicate data models and annotate data instances – Vocabulary of ontologies used for explications and annotations provides
agile horizontal integration– Ontologies, by virtue of their nature and organization, provide semantic
enhancement of data
PersonID Name Description
111 Java Programming
222 SQL Database
SQL Java C++
ProgrammingSkill
ComputerSkill
Skill Education
TechnicalEducation
19
The Meaning of ‘Enhancement’• Semantic enhancement/enrichment of data = arm’s
length approach (no change to data) – through simple explication we associate an entire knowledge system with a database field – enables analytics to process data, e.g. about computer skills,
“vertically” along the Skill hierarchy, as well as “horizontally” via relations between Skill and Education.
– and further… while data in the database does not change, its analysis can be richer and richer as our understanding of the reality changes
• For this richness to be leveraged by different communities, persons, and applications it needs to have the properties mentioned above and be constructed in accordance with the principles of the SE (see References)
SE and Data Integration• Traditional integration approaches involve creation of a new
model used in– A new physical store (data warehouse)
• Expensive, resource- and time-consuming• Another data store – rigid (potential data silo), interoperable with other
stores• Querying the data sources via it
– Fragile
• Both entail loss and or distortion of data and semantics, and provide only ‘local’ integration (do not lead to interoperability with other sources)
• SE of a store – Does not require data reorganization and creation of another
store– Changes to it are non-intrusive– Leads to integration of the store with other stores, enhanced
previously or in the future
21
References• Barry Smith, et al. IAO-Intel – An
Ontology of Information Artifacts in the Intelligence Domain, STIDS Conference, 2013.
• Barry Smith, Tatiana Malyuta, William S. Mandrick, Chia Fu, Kesny Parent, Milan Patel, Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community, STIDS Conference, 2012.
• • Barry Smith, Tatiana Malyuta, David Salmen, William Mandrick, Kesny Parent,
Shouvik Bardhan, Jamie Johnson, “Ontology for the Intelligence Analyst”, Crosstalk: The Journal of Defense Software Engineering, 2012.
• • David Salmen, Tatiana Malyuta, Alan Hansen, Shaun Cronen, Barry Smith,
Integration of Intelligence Data through Semantic Enhancement, STIDS Conference, 2011.
Questions?