The Preparation of Information in Data...
Transcript of The Preparation of Information in Data...
The Preparation of Information in Data Science
2
The Role of Ontologies in Unlocking Big Data
• Big Data holds the potential of revealing great insights from large diverse data sets if properly exploited with the right analytics
• To better realize this potential a shift needs to occur from representations of individual data sets to representations that enable interoperability across all data sets
3
The Common Core Ontology Development Method
• Rule governed development of an extensible set of ontologies to which data from sub-domains can be aligned and linked together
• Combines principles from the Linked Open Data Initiative, Open Biological and Biomedical Ontologies (OBO) Foundry, and object-oriented programming
4
Linked Open Data Initiative
• Began as a means for integrating data on the world wide web
• Based on a simple set of guiding principles* – Use Universal Resource Identifiers (URIs) as
names of things – Use HTTP URIs so that people can look up
those names – When someone looks up a URI provide useful
information – Include links to other URIs so they can discover
other things *TimBerners-Lee“LinkedOpenData”h:ps://www.w3.org/DesignIssues/LinkedData
A Linked Open Data Success Story
DBPedia
5
• Pages accessed from web browsers that link data from Wikipedia
6
Linked Open Data Issue - A Profusion of Ontologies
LinkingOpenDataclouddiagram2014,byMaxSchmachtenberg,ChrisPanBizer,AnjaJentzschandRichardCyganiak.h:p://lod-cloud.net/
7
Effects of Profusion
• Costs increase – relative to the amount of duplicative effort – relative to the number of mappings – relative to the number of vernaculars
• Effectiveness decreases – Searches have low recall and precision – Re-use creates ambiguities
8
OBO Foundry
• The Open Biological and Biomedical (OBO) Foundry is a collaborative group of organizations devoted to establishing best practices in ontology development – Leverages the lessons learned from over
$300M investment in ontology development
9
An OBO Foundry Best Practice – Use a Common Upper Ontology
• Produces common patterns within ontologies – Reuse of mappings from the sources
• Easier to include new sources of data
– Enables reuse of queries and analytics • Structure of data stays constant • Easier to transition to new domains of interest
EnPty
OrganizaPon
Object
QualityofPhysicalArPfact
QualityofOrganizaPo
n
PhysicalArPfact
Quality
has_quality has_quality
bearer_of
10
Basic Formal Ontology
• An upper ontology with not more than 40 class terms and 20 relationships
• Provides an extensible structure for the interrelationships between basic entities
• Used as the upper ontology in hundreds of ontologies, primarily in the biomedical domain
• Used by at least one hundred different project
An OBO Foundry Best Practice - Truth as a Development Guideline
Strive towards creating a digital copy of the world
11
Reduces perspective from the ontology enabling links to many sources
Provides an objective means for settling disputes over terminology
Adds the constraint that every assertion within an ontology must be true
OBO Foundry Issue - Ontologies with Too Wide a Scope
Good practice of reusing existing terminology
12
• But the Ontology of Biomedical Investigations (OBI) is not a logical choice for where the term “Organization” is maintained
Object Oriented Programming - Modularity as a Development Guideline
One axis of modularity in the CCO is level of generality
13
Content and structure is inherited from higher levels
Upper Ontologies Describe the Structure
of the World
Mid-Level Ontologies Add General Content to
the Structure
Domain Level Ontologies
Add Content Relevant to a Community
Upper and mid-level ontologies are stable and of manageable scale
14
Object Oriented Programming - Modularity as a Development Guideline
The second axis of modularity in the CCO is content
A:ribute
Process
SiteTemporalRegion
PhysicalObject
has
parPcipatesin
occursatoccurson
Site
containedin
15
The Common Core Ontologies in Practice
• The Common Core Ontologies (CCO) are intended to serve as a vocabulary that can describe objects and processes that are common to many domains of interest
• The remaining objects and processes that are unique to particular domains of interest are described by ontologies that extend from the CCO in a repeatable, rule governed process
16
The Common Core and Domain Ontologies
BasicFormalOntology(BFO)
ExtendedRelaPonOntology
TimeOntology
QualityOntology
InformaPonEnPty
Ontology
GeospaPalOntology
EventOntology ArPfact
OntologyAgent
Ontology
Affec%veState
Ontology
EthnicityOntology
Occupa%onOntology
HydrographicFeatureOntology
PhysiographicFeatureOntology
CurrencyUnit
OntologyUnitsofMeasureOntology
CurriculumOntology
Ci%zenshipOntology
UpperOntology:
CommonCoreOntology:
DomainOntology:
WatercraCOntology
SensorOntology
AgentInforma%onOntology
UnderseaWarfareOntology
SpaceObjectOntology
17
The Benefits of the Common Core Ontology Development Process