Chapter 7 Ontology Engineering
-
Upload
madaline-giles -
Category
Documents
-
view
48 -
download
4
description
Transcript of Chapter 7 Ontology Engineering
Chapter 7 A Semantic Web Primer1
Chapter 7Ontology Engineering
Grigoris Antoniou
Frank van Harmelen
Chapter 7 A Semantic Web Primer2
Lecture Outline
1. Introduction
2. Constructing Ontologies Manually
3. Reusing Existing Ontologies
4. Using Semiautomatic Methods
5. On-To-Knowledge SW Architecture
Chapter 7 A Semantic Web Primer3
Methodological Questions
– How can tools and techniques best be applied? – Which languages and tools should be used in
which circumstances, and in which order? – What about issues of quality control and resource
management? Many of these questions for the Semantic
Web have been studied in other contexts– E.g. software engineering, object-oriented design,
and knowledge engineering
Chapter 7 A Semantic Web Primer4
Lecture Outline
1. Introduction
2. Constructing Ontologies Manually
3. Reusing Existing Ontologies
4. Using Semiautomatic Methods
5. On-To-Knowledge SW Architecture
Chapter 7 A Semantic Web Primer5
Main Stages in Ontology Development
1. Determine scope2. Consider reuse3. Enumerate terms4. Define taxonomy5. Define properties6. Define facets7. Define instances8. Check for anomaliesNot a linear process!
Chapter 7 A Semantic Web Primer6
Determine Scope
There is no correct ontology of a specific domain – An ontology is an abstraction of a particular
domain, and there are always viable alternatives
What is included in this abstraction should be determined by – the use to which the ontology will be put– by future extensions that are already anticipated
Chapter 7 A Semantic Web Primer7
Determine Scope (2)
Basic questions to be answered at this stage are: – What is the domain that the ontology will cover? – For what we are going to use the ontology? – For what types of questions should the ontology
provide answers? – Who will use and maintain the ontology?
Chapter 7 A Semantic Web Primer8
Consider Reuse
With the spreading deployment of the Semantic Web, ontologies will become more widely available
We rarely have to start from scratch when defining an ontology – There is almost always an ontology available from
a third party that provides at least a useful starting point for our own ontology
Chapter 7 A Semantic Web Primer9
Enumerate Terms
Write down in an unstructured list all the relevant terms that are expected to appear in the ontology
– Nouns form the basis for class names– Verbs (or verb phrases) form the basis for property names
Traditional knowledge engineering tools (e.g. laddering and grid analysis) can be used to obtain
– the set of terms – an initial structure for these terms
Chapter 7 A Semantic Web Primer10
Define Taxonomy
Relevant terms must be organized in a taxonomic hierarchy
– Opinions differ on whether it is more efficient/reliable to do this in a top-down or a bottom-up fashion
Ensure that hierarchy is indeed a taxonomy:– If A is a subclass of B, then every instance of A
must also be an instance of B (compatible with semantics of rdfs:subClassOf
Chapter 7 A Semantic Web Primer11
Define Properties
Often interleaved with the previous step The semantics of subClassOf demands that
whenever A is a subclass of B, every property statement that holds for instances of B must also apply to instances of A– It makes sense to attach properties to the highest
class in the hierarchy to which they apply
Chapter 7 A Semantic Web Primer12
Define Properties (2)
While attaching properties to classes, it makes sense to immediately provide statements about the domain and range of these properties
There is a methodological tension here between generality and specificity:– Flexibility (inheritance to subclasses)– Detection of inconsistencies and misconceptions
Chapter 7 A Semantic Web Primer13
Define Facets: From RDFS to OWL
Cardinality restrictions Required values
– owl:hasValue – owl:allValuesFrom– owl:someValuesFrom
Relational characteristics– symmetry, transitivity, inverse properties,
functional values
Chapter 7 A Semantic Web Primer14
Define Instances
Filling the ontologies with such instances is a separate step
Number of instances >> number of classes Thus populating an ontology with instances is
not done manually – Retrieved from legacy data sources (DBs)– Extracted automatically from a text corpus
Chapter 7 A Semantic Web Primer15
Check for Anomalies
An important advantage of the use of OWL over RDF Schema is the possibility to detect inconsistencies
– In ontology or ontology+instances
Examples of common inconsistencies – incompatible domain and range definitions for transitive,
symmetric, or inverse properties– cardinality properties – requirements on property values can conflict with domain
and range restrictions
Chapter 7 A Semantic Web Primer16
Lecture Outline
1. Introduction
2. Constructing Ontologies Manually
3. Reusing Existing Ontologies
4. Using Semiautomatic Methods
5. On-To-Knowledge SW Architecture
Chapter 7 A Semantic Web Primer17
Existing Domain-Specific Ontologies
Medical domain: Cancer ontology from the National Cancer Institute in the United States
Cultural domain:– Art and Architecture Thesaurus (AAT) with 125,000 terms
in the cultural domain – Union List of Artist Names (ULAN), with 220,000 entries on
artists– Iconclass vocabulary of 28,000 terms for describing cultural
images Geographical domain: Getty Thesaurus of
Geographic Names (TGN), containing over 1 million entries
Chapter 7 A Semantic Web Primer18
Integrated Vocabularies
Merge independently developed vocabularies into a single large resource
E.g. Unified Medical Language System integrating100 biomedical vocabularies – The UMLS metathesaurus contains 750,000 concepts, with
over 10 million links between them
The semantics of a resource that integrates many independently developed vocabularies is rather low– But very useful in many applications as starting point
Chapter 7 A Semantic Web Primer19
Upper-Level Ontologies
Some attempts have been made to define very generally applicable ontologies – Mot domain-specific
Cyc, with 60,000 assertions on 6,000 concepts
Standard Upperlevel Ontology (SUO)
Chapter 7 A Semantic Web Primer20
Topic Hierarchies
Some “ontologies” do not deserve this name: – simply sets of terms, loosely organized in a hierarchy
This hierarchy is typically not a strict taxonomy but rather mixes different specialization relations (e.g. is-a, part-of, contained-in)
Such resources often very useful as starting point Example: Open Directory hierarchy, containing
more then 400,000 hierarchically organized categories and available in RDF format
Chapter 7 A Semantic Web Primer21
Linguistic Resources
Some resources were originally built not as abstractions of a particular domain, but rather as linguistic resources
These have been shown to be useful as starting places for ontology development – E.g. WordNet, with over 90,000 word senses
Chapter 7 A Semantic Web Primer22
Ontology Libraries
Attempts are currently underway to construct online libraries of online ontologies
– Rarely existing ontologies can be reused without changes– Existing concepts and properties must be refined using
rdfs:subClassOf and rdfs:subPropertyOf – Alternative names must be introduced which are better
suited to the particular domain using owl:equivalentClass and owl:equivalentProperty
– We can exploit the fact that RDF and OWL allow private refinements of classes defined in other ontologies
Chapter 7 A Semantic Web Primer23
Lecture Outline
1. Introduction
2. Constructing Ontologies Manually
3. Reusing Existing Ontologies
4. Using Semiautomatic Methods
5. On-To-Knowledge SW Architecture
Chapter 7 A Semantic Web Primer24
The Knowledge Acquisition Bottleneck
Manual ontology acquisition remains a time-consuming, expensive, highly skilled, and sometimes cumbersome task
Machine Learning techniques may be used to alleviate– knowledge acquisition or extraction – knowledge revision or maintenance
Chapter 7 A Semantic Web Primer25
Tasks Supported by Machine Learning
Extraction of ontologies from existing data on the Web
Extraction of relational data and metadata from existing data on the Web
Merging and mapping ontologies by analyzing extensions of concepts
Maintaining ontologies by analyzing instance data Improving SW applications by observing users
Chapter 7 A Semantic Web Primer26
Useful Machine Learning Techniques for Ontology Engineering
Clustering Incremental ontology updates Support for the knowledge engineer Improving large natural language ontologies Pure (domain) ontology learning
Chapter 7 A Semantic Web Primer27
Machine Learning Techniques for Natural Language Ontologies
Natural language ontologies (NLOs) contain lexical relations between language concepts
– They are large in size and do not require frequent updates
The state of the art in NLO learning looks quite optimistic:
– A stable general-purpose NLO exist – Techniques for automatically or semi-automatically
constructing and enriching domain-specific NLOs exist
Chapter 7 A Semantic Web Primer28
Machine Learning Techniques for Domain Ontologies
They provide detailed descriptions Usually they are constructed manually The acquisition of the domain ontologies is
still guided by a human knowledge engineer– Automated learning techniques play a minor role
in knowledge acquisition– They have to find statistically valid dependencies
in the domain texts and suggest them to the knowledge engineer
Chapter 7 A Semantic Web Primer29
Machine Learning Techniques for Ontology Instances
Ontology instances can be generated automatically and frequently updated while the ontology remains unchanged
Fits nicely into a machine learning framework Successful ML applications
– Are strictly dependent on the domain ontology, or – Populate the markup without relating to any
domain theory – General-purpose techniques not yet available
Chapter 7 A Semantic Web Primer30
Different Uses of Ontology Learning
Ontology acquisition tasks in knowledge engineering – Ontology creation from scratch by the knowledge engineer– Ontology schema extraction from Web documents– Extraction of ontology instances from Web documents
Ontology maintenance tasks– Ontology integration and navigation– Updating some parts of an ontology– Ontology enrichment or tuning
Chapter 7 A Semantic Web Primer31
Ontology Acquisition Tasks
Ontology creation from scratch by the knowledge engineer
– ML assists the knowledge engineer by suggesting the most important relations in the field or checking and verifying the constructed knowledge bases
Ontology schema extraction from Web documents– ML takes the data and meta-knowledge (like a meta-
ontology) as input and generate the ready-to-use ontology as output with the possible help of the knowledge engineer
Chapter 7 A Semantic Web Primer32
Ontology Acquisition Tasks(2)
Extraction of ontology instances from Web documents – This task extracts the instances of the ontology
presented in the Web documents and populates given ontology schemas
– This task is similar to information extraction and page annotation, and can apply the techniques developed in these areas
Chapter 7 A Semantic Web Primer33
Ontology Maintenance Tasks
Ontology integration and navigation – Deals with reconstructing and navigating in large
and possibly machine-learned knowledge bases
Updating some parts of an ontology that are designed to be updated
Ontology enrichment or tuning – This does not change major concepts and
structures but makes an ontology more precise
Chapter 7 A Semantic Web Primer34
Potentially Applicable Machine Learning Algorithms
Propositional rule learning algorithms Bayesian learning
– generates probabilistic attribute-value rules
First-order logic rules learning Clustering algorithms
– They group the instances together based on the similarity or distance measures between a pair of instances defined in terms of their attribute values
Chapter 7 A Semantic Web Primer35
Lecture Outline
1. Introduction
2. Constructing Ontologies Manually
3. Reusing Existing Ontologies
4. Using Semiautomatic Methods
5. On-To-Knowledge SW Architecture
Chapter 7 A Semantic Web Primer36
On-To-Knowledge Architecture
Building the Semantic Web involves using– the new languages described in this course– a rather different style of engineering – a rather different approach to application integration
We describe how a number of Semantic Web-related tools can be integrated in a single lightweight architecture using Semantic Web standards to achieve interoperability between tools
Chapter 7 A Semantic Web Primer37
Knowledge Acquisition
Initially, tools must exist that use surface analysis techniques to obtain content from documents – Unstructured natural language documents:
statistical techniques and shallow natural language
technology – Structured and semi-structured documents:
wrappers induction, pattern recognition
Chapter 7 A Semantic Web Primer38
Knowledge Storage
The output of the analysis tools is sets of concepts, organized in a shallow concept hierarchy with at best very few cross-taxonomical relationships
RDF/RDF Schema are sufficiently expressive to represent the extracted info
– Store the knowledge produced by the extraction tools– Retrieve this knowledge, preferably using a structured
query language (e.g. RQL)
Chapter 7 A Semantic Web Primer39
Knowledge Maintenance and Use
A practical Semantic Web repository must provide functionality for managing and maintaining the ontology:
– change management– access and ownership rights– transaction management
There must be support for both– Lightweight ontologies that are automatically generated
from unstructured and semi-structured data– Human engineering of much more knowledge-intensive
ontologies
Chapter 7 A Semantic Web Primer40
Knowledge Maintenance and Use (2)
Sophisticated editing environments must be able to – Retrieve ontologies from the repository– Allow a knowledge engineer to manipulate it– Place it back in the repository
The ontologies and data in the repository are to be used by applications that serve an end-user
– We have already described a number of such applications
Chapter 7 A Semantic Web Primer41
Technical Interoperability
Syntactic interoperability was achieved because all components communicated in RDF
Semantic interoperability was achieved because all semantics was expressed using RDF Schema
Physical interoperability was achieved because All communications between components were
established using simple HTTP connections
Chapter 7 A Semantic Web Primer42
On-To-Knowledge System Architecture