8/14/2019 Web Search Systems
1/23
4
Chapter 2
BACKGROUND AND LITERATURE REVIEW
The Web information resources are growing explosively in number and volume but to retrieve relevant
information from these resources are becoming more difficult and time-consuming. The way of storing
Web information resources is considered a root cause of irrelevancy in information retrieved because they
are not stored in machine-understandable organization. It has been estimated [12] that nearly 48 percent to
63 percent irrelevant results are retrieved. That is, the relevant results are in the range of 37 percent to 52
percent, as shown in Table 2.1, which are far from the accuracy. An extension of current web known as
Semantic Web, has been envisioned to organize the web content through ontologies in order to enable them
machine understandable. To support the theme of semantic web, there is a crucial need of techniques for
designing, development, populating and integrating ontologies.
2.1 Semantic Web Vs Current Web
Semantic web may be compared with non-semantic web within several parameters such as content,
conceptual perception, scope, environment and resource-utilization.
(a) Content: Semantic web encompasses actual content along with its formal semantics. Here, the
formal semantics are machine understandable content, generated in logic-based languages such as Web
Ontology Language (OWL) recommended by W3C. Through formal semantics of content, computers can
make inferences about data i.e., to understand what the data resource is and how it relates to other data. In
todays web there is no formal semantics of existing contents. These content are machine-readable but not
machine understandable.
8/14/2019 Web Search Systems
2/23
5
Table 2.1: Relevant results (from top 20 results) of search some engines
Search Engine Precision
Yahoo 0.52
Google 0.48
MSN 0.37
Ask 0.44
Seekport 0.37
(b) Conceptual Perception: Current web is just like a book having multiple hyperlinked documents. In
book scenario, index of keywords are presented in each book but the contexts in which those keywords are
used, are missing in the indexes. That is, there are no formal semantic of keywords in indexes. To check
which one is relevant, we have to read the corresponding pages of that book. Same is the case with current
web. In semantic web this limitation will be eliminated via ontologies where data is given with well-defined
meanings, understandable by machines.
(c) Scope: Through literature survey, it has been determined that inaccessible part of the Web is about
five hundred times larger than accessible one [13]. It is estimated that there are billion pages of information
available on the web, and only a few of them can be reached via traditional search engines. In semantic web
formal semantics of data are available via ontologies, and the ontologies are the essential component of
semantic web accessible to semantic search engines.
(d) Environment: Semantic Web is the web of ontologies having data with formal meanings. This is in
contrast to current Web which contains virtually boundless information in the form of documents. The
Semantic Web, on the other hand, is about having data as well as documents that machines can process,
transform, assemble, and even act on data in useful ways.
8/14/2019 Web Search Systems
3/23
6
(e) Resources Utilization: There are a lot of Web resources that may be very useful in our everyday
activities. In current web it is difficult to locate them; because they are not annotated properly by the
metadata understandable by machines. In Semantic Web there will be a network of related resources. It will
be very easy to locate and to use them in semantic web world. Similarly there are some other criteria factors
for comparison between current web and semantic web. For example, information searching, accessing,
extracting, interpreting and processing on semantic web will be more easy and efficient; Semantic web will
have inference or reasoning capability; network or communication cost will be reduced in the presence of
semantic web for the reason of relevant results; and many more - some are listed in the Table 2.2.
Table 2.2: Semantic vs. current WebSr. No Web Factors (Non-Semantic) Web Semantic Web
1. Conceptual Perception Large hyperlinked book Large interlinked database
2. Content No formal meanings Formally defined3. Scope Limited-Probably
invisible web excluded
Boundless-Probable invisible web
included
4. Environment Web of documents Web of ontologies, data and documents
5. Resource Utilization Minimum-Normal Maximum
6. Inference/Reasoning capability No Yes
7. Knowledge Management application support No Yes
8. Information searching,accessing,extracting Difficult and time-
consuming task
Easy and Efficient
9. Timeliness,accuracy,transparency of
information
Less More
10. Semantic heterogeneity More Less
11. Ingredients -Content-Presentation
-Content, presentations-Formal Semantics
12. Text simplification and clarification No Yes
According to theme of semantic web, if the explicit semantics of web resources are put together with
their linguistic semantics and are represented in some logic-based languages then we can handle the
limitations of current web. To support this theme of semantic web, W3C recommended some standards [14]
such as RDF (Resource Description Framework), RDFS (RDF Schema), OWL (Web Ontology Language),
SPARQL (a query language for RDF) and GRDDL (Gleaning Resource Descriptions from Dialects of
Languages). RDF is used for data model of semantic web application. Resources are represented through
URIs, connected through labeled edges which are also represented through URIs. RDF is represented
8/14/2019 Web Search Systems
4/23
7
through a language called RDFS. There is another more powerful language so- called OWL to represent
RDF model. The query language such as SPARQL can be used to querying RDF model. Now the semantic
web vision has become a reality [15, 16]. Several semantic web systems have been developed. A subset of
these systems is given in Table 2.3. A huge number of ontology based web documents have been
published.
8/14/2019 Web Search Systems
5/23
8
Table 2.3: Examples of Semantic Web Systems [17]SW Systems Country Activity
area
Application area
of SWT
SWT used SW technology benefits
A Digital Music Archive(DMA) forNRK using SW techniques
Norway Broadcasting
IS,CD,& DI 1,2,3 & IHV IS,INR,S & RD
A linked Open Data Resource ListManagement Tool for UndergradStds
UK ELT, andpublishing
CD,CM,DI, andSA
RDF,3,1,SKOS &PV
(ECR),P, reduced time tomarket, and S & RD
A Semantic Web Content Repositoryfor Clinical Research
US HC and PI DI 1,2,4,5, andPV
Automation,IM,and IS
An Intelligent Search Engine for OnlineServices for Public Admin. Spain PI and eG Portal and IS 1 and IHV ECR,INR, and IS
An Ontology of Cantabrias Cultural
Heritage
Spain PI and
museum
Portal and DI 1 and IHV ECR and IM
Composing Safer Drug Regimens for
the Individual Patient using SWT
US HC DI and IS 1,2,PV, and
IHV
S & RD,open model,IS,and
DCG
CRUZARAn application of semanticmatchmaking for eTourism in the city
of Zaragoza
Spain PI andeToursim
Portal and DI 1,3,Rules,PV, and IHV
Faceted navigation,P,and (S&RD)
Enhancement and Integration ofCorporate Social Software Using SW
France Util andenergy
DI,CM, and SN 1,3, and PV IS,S & RD, and INR
Enhancing Content Search Using SW US IT industry Portal and IS 1 IS and S& RD
Geographic Referencing Framework UK PI,GIS &
eG
DI 1,2 & IHV S&RD and automation
Improving the Reliability of InternetSearch Results Using Search Thresher
Ireland Webaccessibilit
y
IS 1 IS
Improving Web Search Using Metadata Spain andUS
Search Portal,IS,CD, andcustomization
1,2,RDF,4,PV,and IHV
P,open model, and S&RD
KDE 4.0 Semantic Desktop Search and
Tagging
Germany Semantic
desktop
DI,CD,IS,service
integration, andSA
1,3,2 and
RDFS++
IS,IM, and open model
POPSNASAs Expertise LocationService Powered by SW Technologies
US PI DI and SN 1 and 3 Faceted navigation, S&RD, andECR
Prioritization of Biological Targets for
Drug Discovery
US Life
sciences
DI and portal 1,3,2,2DL,P
V,and IHV
IS,S&RD,IM,and ECR
Real Time Suggestion of Related Ideasin the Financial Industry
Spain Financial CD I and IHV IS
S_Ctnt_Desc to improve discoverySemantic MDR and IR for NationalArchives
UKKorea
TelecomPI
CD and ISPortal,CD, andSA
1,PV & IHV1,2,Rules,PV, and IHV
S&RD,open model, and ISIS
Semantic Tags Serbia IT industry SA,SN, and DI 1,3 and PV INR,IS, S&RD,and DCGSWT for Public Health Awareness US HC,PI &
eGDI 1,2,PV &
IHVS&RD and IM
Semantic-based Search and QuerySystem for the Traditional ChineseMedicine Community
China PI and HC DI,IS and Schemamapping
2,3,and PV S&RD and IS
The SW for the Agricultural Domain,Semantic Navigation of Food, Nutrition
and Agricultural Journal
Italy PI and eG Portal,IS,SA, andCD
1,2,SKOS,PV,and IHV
IS
The swordfish Metadata InitiativeBetter,Faster,Smarter Web Content
US IT industry Portal and DI 1 and IHV DCG and ECR
Twine US IT industry SA,SN,and DI RDF,1,2,3 INR,IS,S&RD,and DCG
Use of SWT in Natural Languageinterface to Business Applications
India IT industry Natural languageinterface
1,2,5,3, andIHV
IM and ECR
Different abbreviations and integers used in above table are: SWT(Semantic Web Technologies),IHV(In-House Vocabularies),IS(ImprovedSearch),IM(Incremental Modeling),CD(Content Discovery),DI(Data Integration),INR(Identity New Relationships),S&RD(Share and ReuseData),PV(Public Vocabularies),ECR(Explicit Content Relationships),SA(Semantic Annotation),SN(Social Network),GIS(Geographic
Information System) ,ELT(Education, Learning Technology),CM(Content Management),DCG(Dynamic ContentGeneration),P(Personalization),PI(Public Institution),HC(Health
Care),eG(eGovernement),1(RDFS),2(OWL),3(SPARQL),4(GRDDL),5(Rules, Rules(N3))
8/14/2019 Web Search Systems
6/23
9
2.2 The Web Ontology
Ontology formally provides structural knowledge of a domain and its data in the machine-
understandable way in some W3C [18] recommended technologies such as RDFS [19] and OWL [20] for
ontology formalization. It is an essential component of any semantic web application. Basically, it is the
central component in overall layered architecture of the semantic web, as shown in Figure 2.1. In
ontologies, the information-resources are connected in such a way that each is uniquely identified by URI
and new information can be drawn through a reasoning process. Basically, the ontology is a special type of
network of web-resources. A conceptual view of ontology can be in RDF-graph or in triples form. Consider
an example of a persons family ontology as shown in Figure 2.2. Now by definition of ontology, it looks a
special kind of a network of information resources. The only information explicitly given in said ontology
are isFatherOf, isMotherOf, isBrotherOf, isSisterOf, but we can drive a several types of new information
such as isSonOf(3,1) (i.e. Humza is the son of Farooq), isHusbandOF(8,9), isWifeOf(9,8),
isGrandFatherOf(8,3), isGrandmotherOf(9,3) and many more can easily be derived. It can be implemented
in some logic-based language such as OWL and conceptually it can be seen as triples or RDF-graph.
8/14/2019 Web Search Systems
7/23
10
Figure 2.1: A layered model of semantic web [21].
8/14/2019 Web Search Systems
8/23
11
Figure 2.2: A sample slice of persons family ontology
2.3 Engineering Ontologies for Semantic Web
We have reviewed the literature concerning the adaptation of existing web development methodologies
for semantic web applications. The findings disclose that not much work has been done in this direction.
XML web engineering methodologies adaptations are proposed in the form of WEESA [22]. It generates
semantic annotations by defining a mapping between the XML schemas and existing ontologies. No proper
guidelines are there for designing ontology. WEESA cannot directly design or develop ontologies. It
focused on mapping of XML schema concepts onto concepts of existing ontology. In [23], the authors have
8/14/2019 Web Search Systems
9/23
12
proposed an extension of the Web Site Design Method. In this approach object chunk entities are mapped
to concepts in the ontology. OOHDM has extended in the light of the semantic web technologies [24]. Its
primitives for conceptual and navigation models have been described as DAML classes and RDFS have
been used for domain vocabulary. The Hera [25] methodology has been extended for adaptive web-based
application engineering. It uses semantic web standards for models representation.
Existing web engineering methodologies as mentioned above, have adapted the use of ontology in their
development process, but their main focus is mapping and annotation, using existing ontologies. Design
process of new ontologies during semantic web application development is remained lack of focus. No
proper guidance is there on how to design ontology for semantic web applications.
There are some other methodologies for ontology development as surveyed in [26, 27, 28]. Mostly these
methodologies focus on specification and formalization of ontology and do not concentrate on its design
phase. These methodologies are based on natural language processing (NLP) and machine learning
techniques. Orientation of these methodologies is web-agents facilitation rather than web-contents
formalization. The work on ontology development was boosted when the idea of Semantic Web was
envisioned.
In KBSI IDEF5 methodology [29], data about domain is collected and analyzed. Then built and fixed
strategy is used to create ontology. Ushold and King [30] proposed an ontology development methodology.
In this methodology, after identifying purpose of ontology, it is captured and then coded. In
MethOntology[31] after preparing ontology specification, knowledge is acquired and analyzed to determine
domain terms such as concepts, relations and properties and then formalization is started. After that,
evaluation and documentation is performed.
In [32] a methodology, based on collaborative approach has been proposed. In its first phase, design
criteria for the ontology, specified boundary conditions for the ontology and set of standards for evaluating
ontology, are defined. In second phase, the initial version of ontology is produced, and then through
8/14/2019 Web Search Systems
10/23
13
iterative process the desired ontology is obtained. Software Built and Fixed approach is followed, which
leads to heavy development and maintenance cost.
Helena and Joo [33] proposed a methodology for ontology construction in 2004. This method divides
ontology construction activities in different steps such as specification, conceptualization, formalization,
implementation and maintenance. Knowledge acquisition, evaluation and documentation are performed
during each phase. There are some other approaches investigating the transformation of a relational model
into an ontological model. In these approaches, the ontology is developed from database schemas. These
approaches mainly use reverse engineering theory. In [34], a methodology is outlined for constructing
ontology from conceptual database schemas using a mapping process. The process is carried out by taking
into consideration the logical database model of target system.
Most of methodologies as overviewed above are based on the built and fixed approach. In that, first
initial version of ontology is built and improved iteratively until domain requirements are satisfied. In this
way the basic principles of software engineering are not followed properly. These methodologies mainly
focus on data during developing process rather than focus on descriptive knowledge. These methodologies
mainly work on specification and implementation phases and design phase lacks in proper attention.
Moreover the design and implementation phases of these methodologies are difficult to identify and
separate.
One of the challenges in real world applications is to improve accessing and sharing knowledge that
resides in database. Domain knowledge to be shared needs to be formalized using a formal ontology
language. Extracting and building web ontology on top of Relational Database (RDB) is a way to represent
domain specific knowledge. Moreover, information residing in RDBs is highly engineered for accessibility
and scalability [35] and is characterized by high quality and rapidly increasing correspondence to the
surface web [36]. In the schema mapping techniques, the idea is to convert RDB schema to ontology based
on the predefined schema mapping rules. Various research groups have proposed various techniques.
8/14/2019 Web Search Systems
11/23
14
Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective
mapping for ontologies automatically. It uses RDB schema to create OWL ontology and the mapping
instance data. The translation process relies on a set of data source-to-domain mapping rules. A
processing module named as Semantic Bridge for RDB translates queries to produce OWL ontologies.
Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective
ontology mappings automatically. Automapper is an application independent tool that generates a basic
ontology from a relational database schema and dynamically produces instance data using ontology. This
method quickly exposes the data to the Semantic Web, where a variety of tools and applications are
available to support translation, integration, reasoning, and visualization.
The following class descriptions, axioms and restrictions are currently generated by Automapper:
maxCardinality is set to 1 for all nullable columns and is used for descriptive purposes minCardinality is set to 1 for all non-nullable columns and is used for descriptive purposes. All datatype and object properties that represent columns are marked as Functional Properties. To
ensure global uniqueness and class specificity, these columns are given URIs based on concatenating
the table and column names
allValuesFrom restriction reflect the datatype or class associated with each column and is used fordescriptive purposes. Table 2.4 lists the contents of Departments table
Table 2.4: Departments Table
IDa NAME
1 System Solutions
2 Research and Development
3 Management
a. ID is Primary Key
8/14/2019 Web Search Systems
12/23
15
From this schema, Automapper creates the data source ontology and class-specific inverse functionalrules, as shown in the Fig.2.3
Figure 2.3: Ontology and inverse functional rules created by Automapper
XTR-RTO [38] is an approach to build OWL ontology from eXtensible Markup Language (XML)
document. The transform module firstly maps source of XML schema into RDB and next, maps it into
OWL ontology. Specific attributes for describing RDB are used such as rdb: DbName, rdb: Relation, rdb:
RelationList, rdb: Table and rdb: Attribute. All tables are mapped to instances of type rdb: Relation and
consequently will be added to type rdb: RelationList, where as each attribute is mapped to an instance of
type rdf: Attribute and an instance of type rdb: hasType.
XTR-RTO [38] is an approach to build OWL ontology from eXtensible Markup Language (XML)
document. The transformation module firstly maps source of XML schema into RDB and next, maps it into
OWL ontology. Specific attributes for describing RDB are used such as rdb: DbName, rdb: Relation, rdb:
RelationList, rdb: Table and rdb: Attribute.
All tables are mapped to instances of type rdb: Relation and consequently will be added to type rdb:
RelationList, where as each attribute is mapped to an instance of type rdf: Attribute and an instance of type
rdb: hasType.
dsont:Hresources.Departments a owl:Class;rdfs:subClassOf
[
a owl:Restriction;owl:onProperty
dsont:hresources.departments.ID;owl:allValuesFrom xsd:decimal
],
[a owl:Restriction;
owl:onProperty
dsont:hresources.departments.ID;
1^^xsd:nonNegativeInteger]
8/14/2019 Web Search Systems
13/23
16
The entity-relation model is the most popular style for organizing database at present, which can express
the relationship between data clearly. Therefore metadata information and structural restrictions extracted
from relational database to construct ontologies.The ontology contains:
Vocabularies for describing relational database systems such as:rdb:DBName,rdb:Relation,rdb:RelationList,rdb:Table,rdb:Attribute,rdb:PrimaryKeyAttribute , and rdb:ForeignKeyAttribute.
Semantic relationships between vocabularies suchas:rdb:hasRelation,db:hasAttribute,rdb:primaryKey,db:hasType and rdb:isNullable.
Restrictions on the vocabularies and their semantic relationships such as: each relation has zero ormore attributes, and each attribute has exactly one type.
Mapping approach used in XTR-RTO is described below:
Each table is mapped to an instance of type rdb: Relation and added to type rdb: RelationList. Each attribute is mapped to an instance of type rdb: Attribute and an instance of type rdb: hasType is
generated simultaneously. If the attribute is the foreign key, an instance of type rdb:
ReferenceAttribute and an instance of type rdb: ReferenceRelation are generated to represent this
information.
Generate the restriction of each instance of type rdb: Attribute, such as cardinality restriction and foreign
key restriction.
There is one table in this relational database, as illustrated in Table 2.5
Table 2.5: Book Table in Relational ModelBOOK_ID TITLE AUTHOR PRINTER_ID
Suppose the database is saved in local host, and the OWL ontology describing the database has a
namespace http://localhost/book.owl, which can be changed by users. Fig. 2.4 illustrates the relations in
the ontology:
8/14/2019 Web Search Systems
14/23
17
Figure 2.4: Relations in the ontology
The OWL description of table BOOK is as shown in the Fig. 2.5
Figure 2.5: OWL description of BOOK Table
false
string
false
string
falsestring
false
string
8/14/2019 Web Search Systems
15/23
18
In RTAXON [39] data mining approach is used. This work proposed learning technique to exploit all
database content to identify categorization patterns. These categorization patterns used to generate class
hierarchies. This fully formalized method combines a classical schema analysis with hierarchy mining
(extraction) in the data. Learning method focuses on concept hierarchy identification by identifying the
relation of attributes. Due to the assumption that attribute names have specific role in the relation, this
approach identifies lexical clues of attribute names that reveal their specific role in the relation (i.e.
classifying the tuples).
A learning method that shows how the content of the databases can be exploited to identify
categorization patterns. These categorization patterns are then used to generate class hierarchies. This fully
formalized method combines a classical schema analysis with hierarchy mining (extraction) in the data.
Learning method focuses on concept hierarchy identification by identifying the relation of attributes. Due to
the assumption that attribute names have specific role in the relation, this approach identifies lexical clues
of attribute names that reveal their specific role in the relation (i.e. classifying the tuples).
In example ofFig. 2.6, the categorizing attribute in the Products relation is clearly identified by its name
(Category).
2.3.1 Identification of the categorizing attributes:
Two sources are involved in the identification of categorizing attributes: names of attributes and data
diversity in attribute extensions (i.e. in column data). These two sources are indicators that allow, finding
attribute candidates and selecting the most plausible one.
a.Identification of lexical clues in attribute names:
Attributes bear names that reveal their specific role in the relation (i.e. classifying the tuples) used for
categorization. The lexical clue that indicates the role of the attribute can just be a part of the name, as in the
attribute names CategoryID or Object Type. In Fig. 2.6, for example, the categorizing attribute in the
8/14/2019 Web Search Systems
16/23
19
Products relation is clearly identified by its name (i.e. Category).A list of clues set up and used to perform a
first filtering of potential candidates.
b.Filtering through entropy-based estimation of data diversity:
With an extensive list of lexical clues, the first filtering step appears to be effective. For example,
Category column in the Products relation can be used to derive subclasses. However, experiments on
complex databases show that this step often identifies several candidates. The selection among the
remaining candidates is based on an estimation of the data diversity in the attribute extensions. A good
candidate might exhibit some typical degree of redundancy that can be formally characterized using the
concept of entropy from information theory. Entropy is a measure of the uncertainty of a data source.
Attributes with highly repetitive content will be characterized by low entropy. Conversely, among attributes
of a given relation, the primary key will have the highest entropy since all values in its extension are
distinct. Informally, the rationale behind this selection step is tofavor the candidate that would provide the
most balanced distribution of instances within the subclasses.
2.3.2 Generation and population of the subclasses:
The generation of subclasses from an identified categorizing attribute can be straightforward. A subclass
is derived from each value type of the attribute extension (i.e. for each element of the attribute active
domain). However, proper handling of the categorization source may require more complex mappings.
Example
As illustrated in Fig. 2.6, the derivation applied in this example can be divided into two inter-related. The
first part, named (a) in the figure, includes derivations that are motivated by the identification of patterns
from the database schema. Each relation (or table) definition from the relational database schema is the
source of a class in the ontology.
To complete the class definitions, datatype properties are derived from some of the relation attributes.
The foreign key relationships are the most reliable source for linking classes and, in this example; each
relationship is translated into an object property. The derivations applied to obtain this upper part of the
8/14/2019 Web Search Systems
17/23
8/14/2019 Web Search Systems
18/23
21
2.3.3 Rules fo r learning classes
During learning ontological classes, one class may derive from the information spreading across several
relations. These rules integrate the information in several relations into one ontological class, when these
relations are used to describe one entity.
2.3.4 Rules for learning properties and property characteristics
There are two kinds of properties: object properties and datatype properties. These rules learn object
properties according to the reference relationship between relations. In a relational model, relations may be
used to indicate the relationship between two relations; in this case, such relation can be mapped into an
ontological object property. These rules learn object properties by using the binary relationship between
relations. Complex or non-binary (n-ary) relations are not supported by ontology languages and need to be
decomposed into a group of binary relations. For each attribute in relations, if it cannot be converted into
ontological object property, it can be converted into ontological datatype property.
2.3.5 Rules for learning hierarchy
Classes and properties can be organized in a hierarchy. If two relations in database have inheritance
relationship, then the two corresponding ontological class or property organized in a hierarchy.
2.3.6 Rules for learning cardinality
Ontology defines property cardinality to further specify properties. The property cardinality learned from
the constraint of attributes in relations. If attribute is primary or foreign key then the minCardinality and
maxCardinality of the property is 1. And if any attribute is declared as NOT NULL, the minCardinality of
the property is 1. Furthermore if any attribute is declared as UNIQUE, the maxCardinality of the property is
1.
2.3.7 Rules for learning instances
For an ontological class, its instances consist of tuples in relations corresponding to class and relations
between instances are established using the information contained in the foreign keys in the database tuples.
8/14/2019 Web Search Systems
19/23
22
According to above mentioned rules, ontology constructed based on the data in relational database
automatically.
2.3.8 Implementation
The overall framework of ontology learning is presented in Fig. 5. The input for the framework is data
stored in relational database. The framework uses a database analyzer to extract schema information from
database, such as the primary keys, foreign keys and dependencies. Then the obtained information is
transferred to ontology generator. The ontology generator will generate ontology based on the schema
information and rules. As a last step, user can modify and refine the obtained ontology with the aid of
ontology reasoner and ontology editor.
The framework is a domain/application independent and can learn ontology for general or specific
domains from relational database.Fig.2.7 illustrates ontology construction framework
Figure 2.7: Ontology Construction Framework
RDB
Ontology Generator Rules Lib
OntologyOntology
ReasonerOntology Editor
Ontology Based Application
RDB AnalyzerRDB
Ontology Generator Rules Lib
OntologyOntology
ReasonerOntology Editor
Ontology Based Application
RDB Analyzer
8/14/2019 Web Search Systems
20/23
23
As this survey shows that each approach defines common rules for mapping basic RDB schema pattern
to ontology such as relations, properties, reference keys and cardinalities as shown in the following Table
2.5.
Table 2.5: Comparative Analysis of ontology construction approaches
Approach Mapping
Creation
Procedure Sources Used Mapping Rules Limitations
Automapper:
Relational DatabaseSemantic
Translation using
OWL and SWRL[37]
Automatic -Construction of
ontology fromRDB with the help
of Configuration
file
-Configuration
file-RDB Schema
-Mapping rules
Class,datatype
property, andobject property
-Un-normalized
RDB-Un-resolvable
URIs
Using Relational
Database to Build
OWL Ontology
from XML Data
Sources [38]
Semi-Automatic - Construction of
ontology from
RDB
-RDB schema
-Mapping rules
rdb:Relation,rdb
:Attribute,and
rdb:ReferenceAt
tribute
-Un-normalized
RDB
-Need domain
experts help for
extracting
cardinalityrestrictions.
Mining the Content
of RelationalDatabase to Learn
Ontology with
Deeper Taxonomies[39]
Semi-Automatic -Schema analysis
and Hierarchymining from stored
data
- RDB schema
- Mapping rules- Stored data
Class,datatype
properties,objectproperty, and
M:M object
property
-Sometimes,
attribute name doesnot represent its
value
Learning Ontology
from RelationalDatabase[40]
Automatic - Construction of
Ontology fromRDB without
using middlemodel
-RDB Schema
-Mapping rules
classes,
properties,property
characteristics,cardinalities and
data instances
-Un-normalized
RDB
Ontology built either from scratch using any ontology editor or by leveraging database (document
collections) using (semi-) automatic ontology construction. The topic of ontology construction has been
receiving growing attention since different application might focus on different aspect. However, the basic
idea is to provide vocabularies of concepts and their relationship within a specific domain. Many works
have been done in the area of ontology construction through the use of relational database as the data
source.
As described in Table 2.5, the techniques used in ontology construction from relational database are
based on schema mapping and data mining approaches.
8/14/2019 Web Search Systems
21/23
24
Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective
mapping ontology automatically. The translation process relies on a set of data source-to-domain
mapping rules. These mapping rules depend on well-form relational database schema. In many applications,
sometimes well-form relational database schema is not available, thus it is not guarantee for better results in
ontology construction process.
XTR-RTO [38] uses metadata information extracted from relational database to construct ontology
based on predefined schema translation rules. Effectiveness of theses translation rules depends upon well-
form relational database schema. Sometimes unavailability of well-designed relational database, results in
challenges in ontology construction process.
RTXON [39] identifies lexical clues for attribute names in the filtering process. In many applications,
attribute names, sometimes do not represent its value (any lexical clue), thus better results cannot be
guaranteed in filtering process.
The approach used in [40] acquired ontology from relational database automatically by using group of
learning rules. These learning rules depend upon relational database schema and in many applications,
unavailability of well-formed relational database resulting in inconsistent and incorrect ontology
construction.
2.4 Populating Web Ontologies
Web-ontology automatic population is one of the research issues nowadays. There are two stages of
web-ontology developments; one is its (schema) creation and second is its population. Although populating
web-ontology is not a complicated task, but it is very time consuming, enormous and laborious when it is
performed manually. For the success of Semantic Web there is a need of a proper solution for populating
ontologies. In current web, mostly the data is available in XML data files, and there is extremely large
number of these data resources, containing data in terabytes. To upgrade such data-intensive web
8/14/2019 Web Search Systems
22/23
25
applications into semantic web applications, there is need of some proper methodologies for automatic
populating of ontologies.
Different approaches have been proposed for ontology populating. Some of them are based on natural
language processing and machine learning techniques [41] [42]. Junte Zhang and Proscovia Olango [43]
presented a novel approach for making and populating ontologies. According to this method domain
knowledge about the ontology is collected and domain ontology is constructed by using open source tool
called Protg. This domain ontology is transformed to equivalent RDF file and this RDF file is
manipulated manually to populate the ontology skeleton created by the Protg. XSLT or Xquery have been
used to extract the relevant information from Wikipedia pages into Perls regular expressions and then
ontology instances are generated using those expressions. Semantic heterogeneities and inconsistency
problems were raised while exporting Wikipedia pages to XML format and those remained unsolved.
Web-ontology creation and populating guidelines are provided while developing new semantic web
applications using WSDM [44] [45]. Concepts in the ontology are mapped to object chunks manually at
conceptual level. This conceptual mapping is used to generate actual semantic web page at implementation
level. A similar approached is used in WEESA [46]. This is an adaptation of XML-base web engineering.
Web-ontology concepts are mapped to schema elements of XML document. This mapping is defined for
each page. Then ontology is populated via tool [47]. In [48] a methodology is proposed for extracting data
from web documents for ontology population. This methodology consists of three steps. The first step
consists of extracting information in the form of sentences and paragraphs. Web documents are selected by
using some search engines or we could select them manually. This information is understood by the system
semantically and syntactically and it also understands the relations between the terms of text using some
rhetorical structures. For efficient representation of extracted information, XML is used due to its flexibility
and abilities for handling data. We proposed ontology populating methodology [49] to populate ontology
from data stored in XML data files. This methodology may help in transforming an existing non-semantic
8/14/2019 Web Search Systems
23/23
web application into semantic web application by populating its web-ontology semi-automatically through a
set of transformation algorithms by reducing the time consuming task of ontology population. In [50], a
similar work is presented. The proposed methodologies take a web-ontology schema and XML-document
as input and produce a populated ontology as output.
2.5 Integrating Web Ontologies
When multiple ontologies are simultaneously used in some integration operations such as merging,
mapping and aligning then they may suffer from different types of heterogeneities such as semantic
heterogeneity and non-semantic or syntactic heterogeneity [51,52]. The syntactic heterogeneity is due to
the use of different languages for the formalization of ontologies e.g., one ontology is formalized in OWL
[20] and the other is formalized in DAML [53] language. The semantic heterogeneity consists of
terminological, conceptual and contextual heterogeneities. The terminological heterogeneity occurs when
different terms are used to represent same concepts or same term is used to represent different concepts.
Conceptual heterogeneity between two concepts may occur due to granularities of concepts i.e., when one
is sub-concept or super-concept of the other or both are overlapped. Similarly, two concepts are explicit-
semantic heterogeneous when they have different roles or functionalities in the similar domain.
In ontology model, the taxonomic and non-taxonomic relations of a concept represent its roles and
super/sub concepts respectively. Since, in a certain domain an intellectual concept is explicitly defined in
term of roles that it keeps, therefore we may call the roles of a concept as its explicit semantics. However,
explicit semantics of a non-intellectual concept may be derived from its granularities and its physical
attributes because of typical nature of those domains, e.g. in ontology of a furniture domain, concepts like
chair, table and desk have no intellectual properties, these concepts have only taxonomic (i.e. parent, child,
sibling) and elementary (i.e. color, type, etc.) characteristics.
Top Related