An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

download An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

of 19

Transcript of An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    1/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    DOI : 10.5121/ijwest.2012.3303 33

    ANAPPROACHTO OWL CONCEPT EXTRACTION

    AND INTEGRATIONACROSS MULTIPLE

    ONTOLOGIES

    Nadia Imdadi1

    and Dr. S.A.M. Rizvi2

    Department of Computer Science,

    Jamia Millia Islamia A Central University, New Delhi, [email protected] [email protected]

    ABSTRACTIncrease in number of ontologies on Semantic Web and endorsement of OWL as language of discourse for

    the Semantic Web has lead to a scenario where research efforts in the field of ontology engineering may beapplied for making the process of ontology development through reuse a viable option for ontology

    developers. The advantages are twofold as when existing ontological artefacts from the Semantic Web are

    reused, semantic heterogeneity is reduced and help in interoperability which is the essence of Semantic

    Web. From the perspective of ontology development advantages of reuse are in terms of cutting down on

    cost as well as development life as ontology engineering requires expert domain skills and is time taking

    process. We have devised a framework to address challenges associated with reusing ontologies from the

    Semantic Web. In this paper we present methods adopted for extraction and integration of concepts across

    multiple ontologies. We have based extraction method on features of OWL language constructs and context

    to extract concepts and for integration a relative semantic similarity measure is devised. We also present

    here guidelines for evaluation of ontology constructed. The proposed methods have been applied on

    concepts from food ontology and evaluation has been done on concepts from domain of academics using

    Golden Ontology Evaluation Method with satisfactory outcomes.

    KEYWORDSOntology Engineering, Ontology Creation,, OWL Concepts, Golden Ontology Evaluation Method

    1.INTRODUCTION

    Ontologies are conceptual representation of domains in a formal language that make data machine

    processable over the web. They are key elements that allow knowledge to be represented in astructured way so that a higher degree of interoperability amongst the various heterogeneousresources on the web may be achieved. They are the hinges upon which Semantic Web is built

    upon. A key factor for the success of semantic web is availability of technologies for the efficientand effective reuse of ontological knowledge.

    Ontology engineering, the process of building ontology, is a time consuming activity which alsorequires domain specific skills generally given by experts of a particular field. The approach to

    ontology development can be broadly categorized into two areas one where creation is done from

    scratch and another through reuse, which generally is in the form of merging, integration,alignment, mapping or translation. The former form of development is painstaking while the latter

    makes use of already developed formal domain representations and though it requires a diligentattention it definitely cuts down on the time of development.

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    2/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    34

    With standardization and maturity of semantic web languages that support description logics,

    ontologies on the web have mushroomed and are on the rise. The availability of these semanticresources augur well as they help to achieve the idea of the semantic web and form the necessary

    infrastructure where software agents can make decisions by inferring knowledge from a variety of

    resources. Reuse of existing ontological knowledge on the web to build ontologies for thesemantic web may be helped by the efforts in the field of ontology engineering and vice versa.

    This research is continuation of our previous work [1-4] where foundations for global frameworkfor automatic semantic integration incorporating semantic repositories are presented based on

    [5][6] in which key stages to ontology development through reuse were identified namely:

    ontology discovery, selection, integration and evaluation and possible approaches wereelaborated. Two paradigms have guided the formulation of strategies to address various issues at

    each of the stage of ontology construction and are i) principle of modular approach and ii) asuitable mix of human and computational skills.

    2. GLOBAL FRAMEWORK FOR AUTOMATIC SEMANTIC INTEGRATION

    INCORPORATING SEMANTIC REPOSITORIES

    2.1. Introduction

    In [1] [2] framework was forwarded with a vision of an environment which would encompassontology development from locally available ontologies as well as those available online that is

    on the semantic web. In global context the key components of the framework are query handler,

    semantic kernel, and the global knowledge base.

    This kernel/processor is the backbone of the framework and is first of its kind as its aim is to

    facilitate the usage of semantic repositories scattered across the semantic web by retrievingresources related in context. Important functionality of kernel is execution of global query service

    routine (GQSR) for discovery of online resources. The kernel receives user input and after queryprocessing initiates a global service routine to search and retrieve relevant information from theSwoogles [7] index of semantic web documents on the web.

    [3][4] Bottom up approach to ontology construction is employed as any domain consists of

    concepts, which in turn are collection of few terms, properties and relations amongst these terms.Thus based on these terms and properties an input matrix is created that is then used for concept

    extraction from knowledge resources which are globally present. The bottom up approach issuitable as concepts may be present across multiple online semantic repositories and therefore

    these will have to be identified, extracted and integrated using appropriate strategy. In the

    following sections we discuss how each of the stages identified in process of ontologydevelopment are addressed by the framework.

    2.2. Discovery of Ontologies

    In [4] the methodology to discover ontologies on the web is deliberated. A novel modular

    approach which is realised via input formulation is adopted for discovery of ontologies. Importantaspects taken into consideration are the issues of word disambiguation and context identification.This framework provides solution in form of GQSRmodule used for querying the semantic web

    for discovery of ontologies. The modular approach is implemented through input formulation

    where input is modelled to accommodate issues for identification of same sense words by usingword sense disambiguation technique and context identification by careful selection of words

    which when appear together represent a concept. Three premises are stated that serve as guideduring input modelling: Premise 1- A concept may be identified when a couple of words/terms

    appear together; Premise 2- a word may have more than one sense an attribute or property

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    3/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    35

    associated with it can be used to identify its context; Premise 3- In case of structured information

    like that of namespaces/ontologies, context can be identified as super class (parent-of) and subclass.

    GQSR module is implemented in java and uses Swoogles Web Service API [8] to accessontologies on the Semantic Web. The module makes use of hash bucket algorithm which is

    customized for selection of potential ontologies which are input to the next stage. The thusdiscovered ontologies are retrieved and ranked based on aggregation function which essentially is

    summation of weighted concepts across all the inputs given by the user. For implementation

    detail and results please refer [4].

    3.AN APPROACH FOR OWL CONCEPT EXTRACTION AND INTEGRATION

    ACROSS MULTIPLE ONTOLOGIES

    3.1. Extraction Methodology

    3.1.1. Features of OWL

    This framework works with OWL ontologies as it is the official web ontology language endorsedby the W3C and satisfies all the requirements of a web ontology language that should consist of

    constructs that support the following [9]

    the important concepts (classes) of a domain

    important relationships between these concepts, which can be hierarchical (subclass

    relationships), and other predefined relationships contained in the ontology language, oruser defined (properties)

    further constraints on what can be expressed (e.g. domain and range restrictions,

    cardinality constraints etc.)

    Based on these parameters we present in form of Figure 1 where key constructs of OWL language

    [10] are identified which play significant role in defining a class/concept.

    3.1.2. Class Related Significant OWL Constructs

    Each block in the figure represents some aspect of a particular class. The first block has predicate

    Class as a prime predicate three predicates that define the environment in which the class exists.A class is extended using DatatypeProperty and the ObjectProperty attributes. These are defined

    by using some predicates as indicated in their associated blocks and these help to put certainrestrictions on the relations between instances of two classes as well as then number of elements

    that may participate in some relationship. DatayypeProperty of a class basically describes thefeatures/attributes of that class. On the other hand the ObjectProperty helps to so the associationbetween members of two classes. In order to learn about any class using our framework we

    identify these constructs to be retrieved or aggregated across ontologies.

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    4/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    36

    Figure 1 Basic Class Building Predicates and their AssociationsEach Class/Entity can be defined as composition of {E,Op,Dp,S}, where

    {E: Set of SuperClasses and Set of SubClasses, Op: Set of ObjectProperties and SubProperties,

    Dp: Set of attributes, S: Set containing classes which are associated with the given class as a

    domain or range through object property or datatype property}

    3.1.3. Nature of Representation of Domain Knowledge

    Examining the ontologies retrieved using our discovery and selection method [4] threw light of

    the use of natural language to construct or build these ontologies. It is common practice that aconcept or class is represented by joining of two terms for example to use the term base in

    context ofpizza is represented as PizzaBase or Pizza-Base. In fact this type of naming conventionis promoted by the ontology development environment Protg, which has feature toautomatically attach a term with other when defining hierarchy of classes. OWL documentation

    [11] recommends that all class names should start with a capital letter and should not contain

    spaces.

    3.1.4.Method of Extraction

    With background on the nature of OWL ontologies based on formal language constructs andnatural language constructs we have devised the extraction technique. Three things are considered

    during the extraction process and are:

    - only class information identified in section 3.1.2. are retrieved in relation to a class

    - those classes that are represented using one or more co-joined terms from the user

    defined key terms are extracted

    - classes having two terms of which at least one belongs to the key term list are extracted

    For example if the user defined term list comprises of following {red, white, wine}, then sampleclassesRedWine, WhiteWine, Wine, GrapeWine, RedGrapeWine will be returned.

    The reasons for these restrictions are that ontologies vary in size some may have as may as

    hundred concepts defined whereas another may have more than thousands of concepts defined.Since our approach is for ontology development using a modular approach these restrictions help

    to draw a line and identify potential class candidates. Another benefit that will result from thisapproach is that computational time and memory requirements are also reduced.

    The extractors are implemented using OWL API [12]. The extractor module retrieves class

    definition which includes the following: SuperClasses, SubClasses, DisjointWith Classes andProperties associated with the classes satisfying the above criteria.

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    5/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    37

    3.2. Integration Methodology

    Integration is an important aspect as it helps in assimilating same classes across ontologies. For

    example X is defined in ontology A withn features and X is also defined in ontology B withm

    features. Then to avoid representation of X twice i.e. if they represent the same thing then they

    should be combined through UNION of their features, some similarity measure is needed.Another aspect to consider is the fact that a class may have same class name in two ontologies buthave entirely different set of features, then a UNION could be misleading as these classes may

    represent entirely different concepts. This advocates to device a similarity measure which

    considers both the similarity of features against the dissimilarity of features.

    3.2.1 Similarity Verses Dissimilarity

    Since a class may have more features than the other there is a need to normalize this differenceand define similarity or dissimilarity based on it. Classes may have different number of

    occurrences of a predicate and this difference need to be accounted for. If classes with similarpredicates and different count differ in most cases then it may be concluded that the classes aredissimilar or they represent different aspects of one thing whereas if similar predicates with

    different count are similar in most instances then the likelihood that they represent the same thingincreases.

    Let us consider that PersonnelInformation is defined in two ontologies as following

    Example 1Ontology 1 : PersonnelInformation

    Name Age SSN Address PhoneNo.

    Ontology 2: PersonnelInformation

    Name Age SSN

    Example 2Ontology A Book

    AuthorName Title Publisher ISSN

    Ontology B- Book

    Ticket Status Date

    Table 1 Similarity Vs Dissimilarity Examples

    In Example 1 Table 1 it can be seen that Ontology 1 describes PersonnelInformation moreelaborately than is Ontology 2 and all the features of PersonnelInformation defined in Ontology 2

    are same as the definition in Ontology 1. Thus integration of the two ontologies results in one

    class and should be represented by Ontology 1.

    In general we believe that when any class is defined in an ontology the most basic elements are

    associated with it, as seen in the example name and age are basic features that will have to bedefined when for class employee. Through the above example we will also like to highlight the

    fact that ontology may have a fuller description or definition of an entity class while in other

    ontology only basic essentials are defined for a class. Now, we take another example where thedifferences between two classes exist even though class names are same.

    Now considering Example 2 in Table 1class Bookin Ontology A represents education domainwhile it can be deduced that its namesake in Ontology B represents a concept from travel domain.

    It can therefore be said that these two classes represent different concepts and therefore should betreated as separate entities.

    The above examples emphasis the need to consider similarity versus dissimilarity based on class

    definitions when performing integration of two classes. And therefore we conclude that when

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    6/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    38

    defining a similarity measure for two classes one has to take account of relative closeness of two

    classes before judging them to be representing same concept or different.

    Existing approaches to compute semantic similarity between concepts of two classes have been

    forwarded [13][14][15] for operations such as mapping, aligning, and integrating, but these do notconsider the relative similarity of one class to another which we consider important owing to fact

    that ontologies are developed with user specific requirements and same domain ontologies can bedefined more elaborately by one group while not so by another.

    3.2.2 Relative Semantic Similarity Measure (RSSM)

    Since the parameters we consider are in form of sets of features the problem is to define ameasure which would consider how much of one is contained in another. To device a similarity

    measure which takes similarity verses dissimilarity into account we have considered set based

    similarity measure the Jaccard Index also known as Jaccard Similarity Coefficient [16] which is a

    statistic used for comparing the similarity and diversity of sample sets. It is defined as the size ofthe intersection divided by the size of the union of the sample sets and is given by the following

    formula:

    Dissimilarity between sample sets is known as the Jaccard distance which is complementary tothe Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or,

    equivalently, by dividing the difference of the sizes of the union and the intersection of two setsby the size of the union:

    Jaccard Index and Jaccard distance are two measures which show similarity and dissimilarity butdo not measure to which degree one set of features is contained in the other and vice versa. In

    other words this similarity measure computes similarities between two sets based on presence and

    absence of feature, but these again give a normalized similarity measure which does not reflectthe relative similarity. We believe that there is likelihood that a set of feature of one class may be

    present with a greater degree in set of feature of another class, but vice versa may not be true andso it can be said that the classes in consideration represent the same thing.

    Based on the upper considerations we propose method which computes relative similarity of one

    set of features of a class against set of features of second class and vice versa. In this way it can

    be found which class is more similar to another and to which degree. For example if C1 and C2 aretwo classes then we compute relative similarity using the following formulas

    R(C1,C2) = (C1 C2)/ C1 ..1And,

    R(C2,C1) = (C1 C2)/ C2.. 2

    (1) and (2) indicate degree of closeness of one class is to another, say for Classes X found in two

    ontologies computing (1) gives a value of 1 and (2) gives a value of 0.5, we can deduce all thefeatures present in C1 are also present in C2, whereas only 50 percent of the features of C2 are

    reflected in C1. Therefore, it may be concluded that C1 and C2 represent same concepts.

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    7/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    39

    For example let us consider the two classes as listed in Table 2 have been extracted from different

    ontologies:

    Table 2 Classes Extracted from Two Different Ontologies

    Class Name 1st

    Ontology 2nd

    Ontology

    Fisheggs UNPCSuperClassOther-animal-products

    Taprdf

    SuperClass

    Egg

    SubClassCavior

    FishTopping DisjointWithCheeseTopping, FruitTopping,

    HerbSpiceTopping, MeatToppingNutTopping;SauceTopping

    ,VegetableTopping

    SubClass

    AnchoviesTopping, MixedSeafoodTopping

    PrawnsToppingSuperClassPizzaTopping

    Property

    hasSpiciness, Mild

    DisjointWithDairyTopping,FruitTopping

    HerbSpiceToppingMeatTopping,NutTopping

    SauceToppingVegetableTopping

    SubClasses

    AnchoviesToppingMixedSeafoodTopping

    PrawnsTopping

    SuperClasses

    PizzaTopping

    For the two classes we can identify the followingC1 = { E1,Op1,Dp1,S1 } and C2 = { E2,Op2,Dp2,S2 }

    Since elements of the sets under consideration are strings some string matching and mechanism is

    required. Firstly, we identify the cases when this measure will be applicable.

    Case 1: Measure is computed when two namesake classes exist in different ontologies are found.

    Plural variant of classes will have to be considered for example in case FishEgg and FishEgg thismeasure should be applied. For such cases we use the Levenshtein-Algorithm. [17] [18]Levenshtein algorithm is also called Edit-Distance which calculates the least number of edit

    operations that are necessary to modify one string to obtain another string.

    Case 2: RSSM is computed for classes which are synonyms of each other.

    Therefore, we compute the relative similarity only when two class labels are within a Levenshtein

    Edit Distance (LED) of not more than 1 and with similarity (SIM) >= 0.5 or if the classes aresynonyms.

    For the class FishEggs found in two ontologies the computation is done as followingC1 = {Name= FishEggs,

    E1 = SuperClass: other-animal-products}

    Here |E1| = 3, - other, animal and products are treated as three words defining SuperClass ofFishEggs. And since there is no SubClass for the class the SubClass parameter is set to 0.C2 = { Name= FishEggs,

    E2 = SuperClass: Eggs and SubClass: Cavior}

    The parameters under consideration are matches between the super classes of the two classes as

    other features are present in one and absent in the other it is assumed that one is a more elaboratedefinition than the other.

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    8/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    40

    Relative similarity of C1 to C2R(C1,C2) (E1 E2)/ | E1| = (0)/ (3) = 0R(C2,C1) (E1 E2)/ | E2| = (0)/ 2 = 0

    Therefore, for the class FishEggs there is only label match but no feature match and therefore,

    this class should be treated as separate classes and should be left for the ontology editor to decideon which one to accept.

    Now, computing semantic similarity for FishTopping,

    C1 = {Name= FishTopping,E1 = SuperClass: PizzaTopping; SubClasses: AnchoviesTopping

    MixedSeafoodTopping,PrawnsTopping; DisjointWith: CheeseToppingFruitTopping, HerbSpiceTopping, MeatTopping, NutTopping,

    SauceTopping,VegetableToppingOp1 = hasSpicines

    S1 = Mild}

    C2 = {Name= FishTopping,E2 = SuperClass: PizzaTopping; SubClasses: AnchoviesTopping

    MixedSeafoodTopping,PrawnsTopping; DisjointWith: DairyToppingFruitTopping, HerbSpiceTopping, MeatTopping, NutTopping,SauceTopping,VegetableTopping}

    Taking a count of elements in the respective definitions of the classes we get,Name= 1, Name = 1

    E1 = 1+ 3 +7 = 11 and E2 = 1 + 3 +7 = 11 ( |SuperClasses|,|SubClasses|,|DisjointWith|)

    Other features are present in one but not in the other class definition they are not used incomputing relative similarity of the two classes.

    Now, we compute C1srelative similarity to C2R(C1,C2) = (1 + 3 + 6)/ 11= 0.90

    R(C2,C1) = (1+3+6)/ 11 = 0.90

    We can see that both the relative similarities computed are same and relatively high and therefore

    the classes can be merged into a single class.

    The challenge here is to decide on the threshold value upon which to render two classes similar or

    dissimilar. Based on the above examples and the results of obtained from the chosen domain of

    food ontology, given in following table, we consider the following threshold as suitable.

    Relative similarity of Class A and Class B is computed when either they are synonyms or

    whenLED (A, B) = 0.5

    Classes are considered similar and are integrated if

    For, R (C1, C2) = & R (C2, C1) = , 3where ( > 0.25 and > 0.5) or ( > 0.5 and > 0.25), else the classes are considered asdissimilar and therefore represented as separate entities, and left for user to decide on which

    definition to retain/discard.

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    9/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    41

    3.2.3. Adjacency Matrix Intuitive Method to Display Relationship amongst

    Learned Classes

    We believe that ontology editor should be given an intuitive environment for visualizing the

    relationships namely SuperClassOf, SubClassOf, EquivalentClass, DisjointWith, Domain, Range,

    by displaying these in form of adjacency matrix. This type of visualization has not been done inontology editor environments, which mostly rely on hierarchical or node link types ofrepresentations.

    The advantages of this approach are i) they give an unscrambled interface which results when

    there are multiple edge crossings amongst the classes as can be seen in Figure and ii) the user can

    quickly locate a class and find the type of relationship a class has with other classes by a simplescroll.

    The genesis of this idea came from recent works in field of visualization tools[19][20] forsemantic web that make use of adjacency matrix to visualise huge RDF Graphs with the focus to

    visualize large instance sets and the relations that connect them.

    Adjacency matrix is a data structure used for depicting edges between two nodes of a graph. Inthis framework weighted (represented by different colours) adjacency matrix is used where acolour indicates the type of relationship that exists between two nodes. Psuedocode for the

    integration methodology is as following:

    Pseudocode- Automatic Semantic Integration Incorporating Semantic Repositories

    Input- OWL NAMESPACES/ RDF GRAPHS

    Output ADJACENCY MATRIX and ENTITY LIST/CLASS DICTIONARY

    STARTInput: Set of all Candidate Namespaces, and Keyword Dictionary (KD)

    Start Loop:

    Select a namespace N

    Loop through words in Keyword Dictionary (KD) and search in N if present- Add to Intermediate Adjacency Matrix (IAM)- If match, retrieve entity data and store in Class Dictionary (CD)

    Loop till all namespaces have been processed

    End Loop.

    Loop: Retrieve Relationships

    For all the classes/entities, say represented by set C found in N retrieve the relationship they

    have with each other. Update IAM accordingly.

    End Loop: Retrieve Relationships

    Output:

    - IAM- CD

    End Start Loop: All Namespaces N

    /* Once all the candidate namespaces have been processed, next step is to integrate them tolearn:

    - complete definition of an Entity

    - learn relationships between Entities

    */

    All IAMS Start Loop:

    Select first IAM and CD

    If first IAM then,

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    10/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    42

    Add the entities to Final Adjacency Matrix (FAM) from CD also retain the relationships for

    these entities from IAM.

    If not first IAM then,

    Start Process: List CDCompare entities in CD to Final Class Dictionary (FCD) toCompute RSSM if Synonyms/or LED =0.5,

    - if RSM satisfy thresholds, perform UNION of attributes and update FCD, and no change inFAM accept for relationship update ie if E1 in CD is similar to E2 then the relationship that

    exist with other E1 and other CD objects will have to be added in FAM for E2.

    - if not similar then add the entity + entity data to FCD and add entity to FAMOnce all entities in CD for present Namespace have been exhausted then, update FAM to

    include all relationships that exist between entities, in CD, represented by the IAM for thisNamespace.

    End Process CD

    Next IAM

    End Loop : All IAM

    Output:FAM

    FCDEND

    3.2.4. Class Dictionary Representation of Individual Learned Concepts

    Class dictionary is another output of this integration approach. It consists of the class definition

    that is extracted from the ontologies. Based on LED/Synonym as the case may be relativesemantic similarity is computed for two classes and if found favourable UNION of the two are

    stored in the class dictionary.

    3.2.5. Extraction and Integration: Food Ontology- An Example

    This section contains the results obtained by the application of the extraction methodology on the

    namespaces identified [4].

    The Final Class Dictionary (FCD) consists of 144 classes learned from the five identified

    namespaces. RSSM was computed for 23 pairs of classes part of which is presented in Table 3 (in

    part) out of which 19 were found to satisfy the set threshold and thus were integrated. Propertieslearnt are presented in Table 4 (in part) and it lists the relations learned from across multipleontologies, this is another important aspect related to building ontologies from multiple sources.

    An ontology engineer can accept, filter out, or modify these according to the requirement.

    Table 3 Relative Semantic Similarity Measures of Learned Classes

    Class Name

    Fisheggs 0 0

    FishTopping 0.95 0.95

    MeatTopping 0.95 0.95

    MixedSeafoodTopping 1 1

    NamedPizza 0.33 1

    Pizza 0.39 0.52

    PizzaBase 1 1

    PizzaTopping 0.88 0.88

    TomatoTopping 1 0.83

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    11/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    43

    Figure 2 is a snapshot of adjacency matrix where part of the key classes learnt about concept

    Pizza is presented. A bubble representation of the concepts depicting concept representing Pizzais illustrated in Figure 3 and it can be seen that adjacency matrix Figure 2 is free from the cross

    links that exist in bubble representation and therefore gives user a view that is free of cross links.

    While bubble representation is handier for well formed ontology adjacency matrix representationis more conducive during the designing phase, where one is assessing the type of relationships

    that may exist between various classes.

    Table 4 Learned Properties, Domain and Range

    Extracted Properties Domain Range

    hasBase Pizza PizzaBase

    hasTopping NamedPizza PizzaTopping

    hasSpiciness PizzaTopping Mild

    hasBody Wine Full, Medium

    hasSugar Wine OffDry, Sweet

    hasMaker Wine Winery

    hasFlavor Wine Moderate, Strong

    CLA

    SSESBurgun

    dy

    Dessert

    Wine

    Dry

    Re

    dWine

    Dry

    White

    Wine

    Dry

    Wine

    FishTopp

    ing

    Fru

    its

    Fru

    itTopp

    ing

    Grape

    fru

    it

    Grapes

    Mea

    tTopp

    ing

    Mea

    tyPizza

    Mixe

    dS

    ea

    foo

    dTopp

    ing

    Pas

    taS

    auce

    Pizza

    Pizza

    Base

    Pizza

    Topp

    ing

    Re

    dWine

    Sauce

    Sauce

    Topp

    ing

    Swee

    tWine

    Toma

    toTopp

    ing

    White

    Wine

    Wine

    Burgundy

    DessertWine

    DryRedWine

    DryWhiteWine

    DryWine

    FishTopping

    Fruits

    FruitToppingGrapefruit

    Grapes

    MeatTopping

    MeatyPizza

    MixedSeafoodTopping

    PastaSauce

    Pizza

    PizzaBase

    PizzaTopping

    RedWine

    Sauce

    SauceTopping

    SweetWine

    TomatoTopping

    WhiteWine

    Wine

    ---SuperClassOf

    --- SubClassOf

    --- DisjointWith

    --- Domain

    --- IsDomainOf

    Figure 2 Part Adjacency Matrix

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    12/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    44

    Figure 3 Part Bubble Diagram Showing Learned Pizza Concept

    4.ONTOLOGY EVALUATION

    Ontology evaluation is a process concerned with checking to what extent the developed ontologyconforms to the requirements. The task of evaluation becomes easy if one has reference ontology.

    In such case Golden Standard Methodology [21] of evaluation can be applied. But, this may notalways be the case and therefore we suggest that assessment by humans should be the appropriate

    approach as then all the levels of evaluation: lexical, vocabulary, concept and data;

    hierarchy/taxonomy; other semantic relations; context application; syntactic; architecture anddesign as proposed by [21] would be covered.

    5. EVALUATION OF FRAMEWORK

    We validate our approach as evaluation of results obtained for building an ontology using thisframework can still be performed using Golden Standard Method [21]. Using the golden standard

    gives us flexibility

    - To evaluate framework in a general setup as reference ontology can be from any

    domain.- Other advantage is to see to what degree our approach is able to extract correct

    concepts; classes and semantic relationships based on some existing ontology.

    5.1.GOLDEN STANDARD METHOD OF EVALUATION

    Golden Standard Method is evaluated at four levels namely: level 1- lexical, vocabulary, concept,

    and data; level 2- hierarchy, taxonomy; level 3- other semantic relations; level 4- syntactic level.

    In order to perform Golden Standard Method we select a reference ontology which has beendeveloped by domain experts and exists on the web. The reference ontology selected represents

    concepts from university.

    Level 1 Lexical, vocabulary, concept, data

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    13/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    45

    Table 5 Concept Matrix

    Concept- Publication

    Publication, Article, Book, Conference, Journal

    Publication, Technical Report, Workshop Paper

    Publication, Journal, Special, Issue, OnlineConcept- Person

    Person, Employee, Academic, Staff, Administrative

    Administrative, Staff, Secretary, Technical, Organization

    Concept- Organization

    University, Student, PhD, research, group

    Organization, Department, Institute, Research Group, University

    Concept- Conference

    Activity, Event, Conference, Meeting, Workshop

    Level 2 Hierarchy, taxonomy under consideration is presented in Table 6

    Table 6 Hierarchy/Taxonomy

    Publication (SuperClass)

    ---Article (Class)---ArticleInBook (SubClass)

    ---ConferencePaper (SubClass)

    ---JournalArticle (SubClass)---TechnicalReport (SubClass)

    ---WorkshopPaper (SubClass)---Book (Class)

    ---Journal (Class)

    ---SpecialIssuePublication (SubClass)---OnlinePublication (Class)

    Organization (SuperClass)

    ---Department (SubClass)---Institute (SubClass)

    ---ResearchGroup (SubClass)

    ---University (SubClass)

    Person (SuperClass)

    ---Employee (Class)

    ---AcademicStaff (SubClass)---Lecturer (SubClass)

    ---Researcher (SubClass)---PhDStudent (SubClass)

    ---AdministrativeStaff (SubClass)---Secretary (SubClass)---TechnicalStaff(SubClass)

    ---Student (SubClass)---PhDStudent (SubClass)

    Event (SuperClass)---Activity (SubClass)

    ---Conference (SubClass)---Meeting (SubClass)

    ---Workshop (SubClass)

    Level 3 Other semantic relations define constraints on Class

    Consider Class Article Table 7, from reference ontology keywordassociated with it can only bestring; author for article can only be from class person. Sample Semantic Relations for class

    Article as defined in reference ontology:

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    14/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    46

    Table 7 Reference Class Name Article description

    Reference

    Ontology

    Lexical/

    Vocabulary

    Other Semantic Relations Syntactic

    Ka.Owl Article keyword only string; author only person;

    title only string; online version only;OnlinePublication class; year only integer;

    abstract only string

    OWL

    Level 4 Syntactic Owl Description

    5.2.Evaluation

    We applied our method on the concepts identified at level one from the reference ontology. Table

    4 shows concepts/ input to the system.

    Table 8 gives the aggregated ranked namespaces:

    Table 8 Aggregated Ranked Ontologies

    Namespace Alias Name Weight

    http://annotation.semanticweb.org/iswc/iswc.owl Annotation 2.6

    http://morpheus.cs.umbc.edu/aks1/ontosem.owl Morpheus 4.2

    http://purl.oclc.org/NET/nknouf/ns/bibtex Bitex 1

    http://swrc.ontoware.org/ontology SWRC 6.2

    http://www.aktors.org/ontology/portal Aktors 6.4

    The 4th

    namespace, alias name SWRC, in the above gives high score implying that it has good

    concept coverage. This is so as it is another version of the reference ontology and therefore it is

    not considered in the later stages of the evaluation of the framework to remove any biases.

    Another aspect that we highlight here is that the 5th namespace in Table 8 is a huge ontology withmany classes defined and perhaps therefore gives greater concept coverage with the maximumaggregate of 6.4.

    The next stage in framework is to extraction of concepts. A total of 297 classes are identified as

    potential classes by the method proposed in the extraction. But since we have reference ontologyexact classes to look for are known and therefore list of classes can be reduced to exact matchesfrom the reference ontologies. The reduced list of potential classes (24) is shown in Table 9.

    Table 9 List of Potential Classes

    Academic, Activity, Article, Book, Booklet, Conference, Department, Employee, Event,InBook, Institute, Journal, Meeting, Organization, Person, PhdThesis, Publication, Research,

    Researcher, Student, Secretary, TechReport, Workshop, University

    5.2.1 Result Analysis

    The reduced list of classes have exact lexical equivalent of classes from the reference ontology.And therefore it can be seen that from out of 28 keyword lists from which we had formed the

    concepts 24 have been identified by this framework from across multiple ontologies. Thus level 1of the Golden Standard Method is accomplished and it may be concluded

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    15/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    47

    - This framework successfully retrieved relevant namespace, as it not only identified

    namespace which is version of the reference ontology, but gives it a highestaggregate only second to an ontology which has many concepts.

    - This framework is able to identify 24 classes out of 28 key terms that form the

    concepts from across multiple ontologies.

    Golden Standard Method of ontology evaluation has been explored for the evaluation of learnedontologies against reference ontology and [22] has emerged as evaluation method which not only

    considers lexical layer but also concept hierarchies of learned and the reference ontology during

    the evaluation process.

    Now Level 2 of Golden Standard Method comprises of checking for taxonomic similaritiesbetween the learned concepts with ones in the reference ontology. In order to perform Level-2 we

    have used OnteEval Tool [23] which is implementation of [22] on individual namespaces thatwere retrieved in the first stage of the framework execution. The output of running the algorithm

    on the each namespace along with reference ontology is set of concepts that are found similar in

    both. Table 10 gives the result.

    Table 10 Result of OnteEval Tool

    Class

    Namespace

    Bibtex Aktors Morpheus Annotation

    Article x x

    Book x x x x

    Conference x x x

    Department x

    Employee x x

    Event x x x

    Institute x x

    Journal x

    Meeting xOrganization x x x

    Person x x x

    Publication x x

    Researcher x x

    Secretary x x x

    Student x x x

    University x x x

    Workshop x x x

    It can be seen that concept hierarchy for as many as 17 classes across the ontologies are found to

    be similar to the ones in reference ontology, thereby leading to conclusion that their integration is

    plausible.

    The results of level-2 underline two aspects, which are favourable in suggesting that the

    framework will lead to correct concept formation and are:

    - Our method is able to identify concepts present across multiple ontologies. Forinstance class Department is found in namespace Annotation only whereas class

    Book is found in all the ontologies Table 10.

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    16/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    48

    - Integration will possibly lead to correct concept formulation as each class in the

    above table is deemed similar to one in reference ontology by OnteEval tool as well.

    Level 3 of the evaluation process is about comparing learned other semantic relations with the

    ones in reference ontology. This level has been performed manually on set of classes in Table 10;we have further verified whether union of attributes that lead to class formulation based on the

    RSSM proposed in this framework leads to correct learning of other semantic relations which aredefined on a class in the reference ontology. It can be seen from Table 11 (in part) not all classes

    found similar by OnteEval tool across ontologies satisfy the RSSM defined by the framework.

    But comparison of the ones that satisfying the criteria as shown in Table 12 (in part), depictcorrect learning of lexical, hierarchical and other semantic relations (represented by italic).

    Table 11 Relative Semantic Similarity Measures for Learned Classes

    Class Namespaces RSSM R(C1,C2),R(C2,C1) Merged

    Article Morpheus/Bibtex 0,0 No

    Book Aktors/Morpheus 0,0 No

    Book Aktors/Annotation 0.38,0.36 No

    Book Aktors/Bibtex 0.16,0.20 NoConference Aktors/Annotation 0.5,0.25 Yes

    Department Annotation - -

    Employee Aktors/Annotation 1,0.35 Yes

    Event Aktors/Morpeus 0,0 No

    Event Aktors/Annotation 0.5,1 Yes

    Table 12 Actual Class and Those Resulting from Merging based on Relative Semantic Similarity

    Measures

    Class Namespace Neighbourhood Semantic Relations

    Conference Reference SuperClass:

    Event

    DisjointWith:Activity; Meeting;

    SpecialIssueEvent;Workshop

    Number only string

    Series only string

    Location only stringatEvent only Event

    publication onlyPublication

    hasParts only EventorgCommittee only Person

    date only stringeventTitle only string

    keyword only string

    Learnt Class

    After

    Integration

    SuperClasses learnt:Meeting taking place

    Event;

    DisjointWith learnt:

    Workshop

    EventProduct onlyPublication

    Location as string

    Date as string

    EventTitle as string

    Employee Reference SuperClass: Person

    SubClass:Academic staff

    Administrative staff

    DisjointWith:Student

    Address only string

    fax only stringphoto only string

    email only stringlastName only string

    name only string

    middleInitial only string

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    17/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    49

    phone only string

    firstName only stringkeyword only string

    Learnt Class

    After

    Integration

    SuperClass:Person

    SubClass:

    Educational Support StaffSecretary System-

    Administrator, GraphicDesigner, Multimedia,

    DisjointWith:

    Faculty Member

    Researcher, Student

    Email string

    Firstname string

    Phone as stringname only string

    Middle initial only string

    Lastname string

    Photo string

    Fax string

    Researchtopics topic

    Homepage stringHas_affiliaiton onlyorganization

    Address sInvolvedin project only

    project

    Event Reference SuperClass: ObjectSubClasses:Activity,ConferenceWorkshop,

    SpecialIssueEvent

    Meeting

    atEvent only eventdate only stringeventTitle only stringhasParts only Event

    location only string

    orgCommittee only personpublication only

    publicationkeyword only string

    Learnt Class

    After

    Integration

    SuperClasses: Thing-

    temporal thing

    SubClass:Conference; Workshop;

    Tutorial;

    Date string

    eventTitle string

    location string

    Results or conclusions of performing Level 3 are summarized as following:

    - The classes merged based on RSM lead to correct learning of a class based on classdefinition in the reference ontology.

    - The framework was able to uncover other semantic relations which where defined in

    other ontologies, and were similar to the ones defined for a class in the referenceontology.

    6. CONCLUSION

    In this paper an approach to extraction and integration of concepts/classes across multipleontologies is proposed and evaluated on a well formed ontology. Need to look at the concept

    similarity computation through the prism of similarity versus dissimilarity of features to address

    natural language disambiguation issues where similar names in two different ontologies mayrepresent same concept or an entirely different one. The problem of concept similarity is reduced

    to set/feature based matching, and a Relative Similarity Measure Formula for computation

    proposed.

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    18/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    50

    A novel way of presenting the learned relationships using adjacency matrix is proposed, which is

    a convenient representation for ontology developer during designing phase of ontology.

    This framework allows ontology editors to reuse ontologies that exist on the semantic web. It

    automates the process of finding relevant ontologies automatically as well as to identify, andintegrate concepts across multiple ontologies in an automatic manner. However, some limitations

    of this framework are: it works best for domains defined using natural language, for domainsformed of technical or symbolic representations, like the medical and chemical domains, where

    word sense disambiguation techniques are not applicable this approach may not lead to best

    results; also availability of ontologies for a domain would impact the results of this framework.

    7. FUTURE WORK

    A research challenge relevant in context of ontologies is how semantic repositories function over

    time in order to take account of their necessary maintenance and deployment. This is a promising

    area of research and is termed as ontology evolution. Ontologies are conceptualizations ofdomains which are also affected by the changes in the world and therefore there is need for their

    evolution to keep them relevant to the model of the world they represent. Some other factors that

    cause ontologies to evolve are corrections of design flaws, changing user- and businessrequirements, a shift of focus on a domain.

    This framework makes use of ontologies that exist in decentralized environment of the web. Thelikelihood that changes or evolution of these ontologies will have to be reflected by the ontology

    created using this framework can not be ruled out. Therefore, from future perspective the

    functionality of the kernel should be expanded to include a module which would take care of anysuch needs.

    REFERENCES

    [1] S.A.M Rizvi & Nadia Imdadi (2008), Framework for Automatic Semantic Integration of Semantic

    Repositories, International Conference on Semantic e-Business & Enterprise Computing, Kerala,India.

    [2] Nadia Imdadi & S.A.M Rizvi (2010),Framework for Automatic Reuse of Existing Online Semantic

    Resources by Facilitating Concept Extraction Using Word Sense Disambiguation in Computational

    Linguistics Techniques, International Conference on Semantic Web & Web Services, WorldComp,

    Nevada, USA.

    [3] Nadia Imdadi & S.A.M Rizvi (2010), Automating Reuse of Semantic Repositories in the Context of

    Semantic Web, International Conference on Semantic e-Business & Enterprise Computing, Springer,

    Tamil Nadu, India, pp 518-523 ISBN: 978-3-642-14493-6.

    [4] Nadia Imdadi & S.A.M Rizvi (2011), Using Hash based Bucket Algorithm to Select Online

    Ontologies for Ontology Engineering through Reuse, International Journal of Computer

    Applications, 28(7):21-25, August 2011. Published by Foundation of Computer Science, New York,USA

    [5] Harith Alani (2006), Position paper: ontology construction from online ontologies, In Proceedings

    of the 15th international conference on World Wide Web, ACM, New York, NY, USA, 491-495.

    [6] Elena Simperl (2009), Reusing ontologies on the Semantic Web: A feasibility study, Data &

    Knowledge Engineering Elsevier, 68 905925.

  • 7/31/2019 An Approach to Owl Concept Extraction and Integration Across Multiple Ontologies

    19/19

    International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.3, July 2012

    51

    [7] L. Ding, T. Finin, A. Joshi, R. Pan, R. S. Cost, Y. Peng, P. Reddivari, V. C. Doshi, & J. Sachs (2004),

    Swoogle: A semantic web search & metadata engine, In Proc. 13th ACM Conf. on Information &

    Knowledge Management.

    [8] Ebiquity Group at UMBC, Swoogle Web Services, [Online]. Available:

    http://swoogle.umbc.edu/index.php?option=com_swoogle_manual&manual=search_overview

    [9] Grigoris Antoniou , Enrico Franconi , Frank Van Harmelen (2005), Introduction to Semantic Web

    Ontology Languages Reasoning Web, Proceedings of the Summer School(Number 3564 in Lecture

    Notes in Computer Science), Malta.

    [10] OWL Web Ontology Language Reference (2004), W3C Recommendation 10 February 2004,

    http://www.w3.org/TR/owl-ref/.

    [11] Matthew H., Simon J., Georgina M., Alan R., Robert S., Chris Wroe (2007), A Practical Guide To

    Building OWL Ontologies Using Protege 4 & CO-ODE Tools Edition 1.1, The University Of

    Manchester,http://owl.cs.manchester.ac.uk/tutorials/protegeowltutorial/resources/ProtegeOWLTutoria

    lP4_v1_1.pdf .

    [12] The OWL API, University of Manchester, http://owlapi.sourceforge.net/

    [13] T. Bach, & R. Dieng-Kuntz (2005), Measuring Similarity of Elements in OWL DL Ontologies, The

    Twentieth National Conference on Artificial Intelligence, AAAI.

    [14] Le D. Ngan, Tran M. Hang, & Angela E. S. Goh (2006), Semantic Similarity between Concepts

    from Different OWL Ontologies, IEEE International Conference on Industrial Informatics.

    [15] Xiquan Yang1, Ye Zhang1,2, Na Sun1, Deran Kong1 (2009), Research on Method of Concept

    Similarity Based on Ontology, Proceedings International Symposium on Web Information Systems

    & Applications, pp. 132-135, ISBN 978-952-5726-00-8.

    [16] Wikipedia The Free Encyclopedia, Jaccard index, http://en.wikipedia.org/wiki/Jaccard_index

    [17] The Levenshtein-Algorithm, http://www.levenshtein.net/index.html

    [18] Levenshtein Edit Distance, http://www.miislita.com/searchito/levenshtein-edit-distance.html

    [19] Benjamin Bach, Emmanuel Pietriga, Ilaria Liccardi, Gennady Legostaev (2011), OntoTrix: a hybrid

    visualization for populated ontologies, In Proceedings of the 20th International Conference

    Companion on World Wide Web, pp. 177-180, Hyderabad, India.

    [20] Benjamin Bach, Emmanuel Pietriga, Ilaria Liccardi Gennady Legostaev (2011), RDF Visualization

    using a Three-Dimensional Adjacency Matrix, (Inproceedings). 4th International Semantic Search

    Workshop, Hyderabad, India ,

    [21] Janez Brank , Marko Grobelnik , Dunja Mladeni (2005), A survey of ontology evaluation

    techniques, In Proceedings of the Conference on Data Mining & Data Warehouses.

    [22] Dellschaft, K. & Staab, S. (2006), On How to Perform a Golden Standard Based Evaluation of

    Ontology Learning, International Semantic Web Conference, pp. 228-241

    [23] Christopher Brewster, Jose Iria & Ziqi Zang (2007), Automating Ontology Learning for the

    Semantic Web, OnteEval Tool, Abraxas Project.