Specifying Conceptual Models Using Restricted Natural Language · proach to conceptual modelling...

9
Specifying Conceptual Models Using Restricted Natural Language Bayzid Ashik Hossain Department of Computing Sydney, NSW Australia [email protected] Rolf Schwitter Department of Computing Sydney, NSW Australia [email protected] Abstract The key activity to design an infor- mation system is conceptual modelling which brings out and describes the general knowledge that is required to build a sys- tem. In this paper we propose a novel ap- proach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted form of natural language. A restricted natural language is a subset of a natural language that has well-defined computational prop- erties and therefore can be translated un- ambiguously into a formal notation. We will argue that a restricted natural lan- guage is suitable for writing precise and consistent specifications that lead to exe- cutable conceptual models. Using a re- stricted natural language will allow the domain experts to describe a scenario in the terminology of the application domain without the need to formally encode this scenario. The resulting textual specifica- tion can then be automatically translated into the language of the desired conceptual modelling framework. 1 Introduction It is well-known that the quality of an information system application depends on its design. To guar- antee accurateness, adaptability, productivity and clarity, information systems are best specified at the conceptual level using a language with names for individuals, concepts and relations that are easily understandable by domain experts (Bernus et al., 2013). Conceptual modelling is the most important part of requirements engineering and is the first phase towards designing an information system (Oliv´ e, 2007). The conceptual design pro- cedure generally includes data, process and be- havioral perceptions, and the actual database man- agement system (DBMS) that is used to imple- ment the design of the information system (Bernus et al., 2013). The DBMS could be based on any of the available data models. Designing a database means constructing a formal model of the desired application domain which is often called the uni- verse of discourse (UOD). Conceptual modelling involves different parties who sit together and de- fine the UOD. The process of conceptual mod- elling starts with the collection of necessary infor- mation from the domain experts by the knowledge engineers. The knowledge engineers then use tra- ditional modelling techniques to design the system based on the collected information. To design the database, a clear understanding of the application domain and an unambiguous in- formation representation scheme is necessary. Ob- ject role modelling (ORM) (Halpin, 2009) makes the database design process simple by using nat- ural language for verbalization, as well as dia- grams which can be populated with suitable ex- amples and by adding the information in terms of simple facts. On the other hand, entity relation- ship modelling (ERM) (Richard, 1990; Frantiska, 2018) does this by considering the UOD in terms of entities, attributes and relationships. Object- oriented modelling techniques such as the unified modelling language (UML) (O´ Regan, 2017) pro- vide a wide variety of functionality for specifying a data model at an implementation level which is suitable for the detailed design of an object ori- ented system. UML can be used for database de- sign in general because its class diagrams provide a comprehensive entity-relationship notation that can be annotated with database constructs. Alternatively, a Restricted Natural Language (RNL) (Kuhn, 2014) can be used by the domain experts to specify system requirements for con- ceptual modelling. A RNL can be defined as a subset of a natural language that is acquired by Bayzid Ashik Hossain and Rolf Schwitter. 2018. Specifying Conceptual Models Using Restricted Natural Language. In Proceedings of Australasian Language Technology Association Workshop, pages 44-52.

Transcript of Specifying Conceptual Models Using Restricted Natural Language · proach to conceptual modelling...

Page 1: Specifying Conceptual Models Using Restricted Natural Language · proach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted

Specifying Conceptual Models Using Restricted Natural Language

Bayzid Ashik HossainDepartment of Computing

Sydney, NSWAustralia

[email protected]

Rolf SchwitterDepartment of Computing

Sydney, NSWAustralia

[email protected]

Abstract

The key activity to design an infor-mation system is conceptual modellingwhich brings out and describes the generalknowledge that is required to build a sys-tem. In this paper we propose a novel ap-proach to conceptual modelling where thedomain experts will be able to specify andconstruct a model using a restricted formof natural language. A restricted naturallanguage is a subset of a natural languagethat has well-defined computational prop-erties and therefore can be translated un-ambiguously into a formal notation. Wewill argue that a restricted natural lan-guage is suitable for writing precise andconsistent specifications that lead to exe-cutable conceptual models. Using a re-stricted natural language will allow thedomain experts to describe a scenario inthe terminology of the application domainwithout the need to formally encode thisscenario. The resulting textual specifica-tion can then be automatically translatedinto the language of the desired conceptualmodelling framework.

1 Introduction

It is well-known that the quality of an informationsystem application depends on its design. To guar-antee accurateness, adaptability, productivity andclarity, information systems are best specified atthe conceptual level using a language with namesfor individuals, concepts and relations that areeasily understandable by domain experts (Bernuset al., 2013). Conceptual modelling is the mostimportant part of requirements engineering and isthe first phase towards designing an informationsystem (Olive, 2007). The conceptual design pro-cedure generally includes data, process and be-

havioral perceptions, and the actual database man-agement system (DBMS) that is used to imple-ment the design of the information system (Bernuset al., 2013). The DBMS could be based on any ofthe available data models. Designing a databasemeans constructing a formal model of the desiredapplication domain which is often called the uni-verse of discourse (UOD). Conceptual modellinginvolves different parties who sit together and de-fine the UOD. The process of conceptual mod-elling starts with the collection of necessary infor-mation from the domain experts by the knowledgeengineers. The knowledge engineers then use tra-ditional modelling techniques to design the systembased on the collected information.

To design the database, a clear understandingof the application domain and an unambiguous in-formation representation scheme is necessary. Ob-ject role modelling (ORM) (Halpin, 2009) makesthe database design process simple by using nat-ural language for verbalization, as well as dia-grams which can be populated with suitable ex-amples and by adding the information in terms ofsimple facts. On the other hand, entity relation-ship modelling (ERM) (Richard, 1990; Frantiska,2018) does this by considering the UOD in termsof entities, attributes and relationships. Object-oriented modelling techniques such as the unifiedmodelling language (UML) (O´ Regan, 2017) pro-vide a wide variety of functionality for specifyinga data model at an implementation level which issuitable for the detailed design of an object ori-ented system. UML can be used for database de-sign in general because its class diagrams providea comprehensive entity-relationship notation thatcan be annotated with database constructs.

Alternatively, a Restricted Natural Language(RNL) (Kuhn, 2014) can be used by the domainexperts to specify system requirements for con-ceptual modelling. A RNL can be defined as asubset of a natural language that is acquired by

Bayzid Ashik Hossain and Rolf Schwitter. 2018. Specifying Conceptual Models Using Restricted Natural Language. InProceedings of Australasian Language Technology Association Workshop, pages 44−52.

Page 2: Specifying Conceptual Models Using Restricted Natural Language · proach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted

constraining the grammar and vocabulary in orderto reduce or remove its ambiguity and complexity.These RNLs are also known as controlled naturallanguage (CNL) (Schwitter, 2010). RNLs fall intotwo categories: 1. those that improve the read-ability for human beings especially for non-nativespeakers, and 2. those that facilitate the automatedtranslation into a formal target language. The mainbenefits of an RNL are: they are easy to under-stand by humans and easy to process by machines.In this paper, we show how an RNL can be usedto write a specification for an information sys-tem and how this specification can be processedto generate a conceptual diagram. The grammarof our RNL specifies and restricts the form of theinput sentences. The language processor trans-lates RNL sentences into a version of descriptionlogic. Note that the conceptual modelling processusually starts from scratch and therefore cannotrely on existing data that would make this processimmediately suitable for machine learning tech-niques.

2 Related Work

There has been a number of works on formalizingconceptual models for verification purposes (Be-rardi et al., 2005; Calvanese, 2013; Lutz, 2002).This verification process includes consistency andredundancy checking. These approaches first rep-resent the domain of interest as a conceptualmodel and then formalize the conceptual modelusing a formal language. The formal representa-tion can be used to reason about the domain of in-terest during the design phase and can also be usedto extract information at run time through queryanswering.

Traditional conceptual modelling diagramssuch as entity relationship diagrams and unifiedmodelling language diagrams are easy to gener-ate and easily understandable for the knowledgeengineers. These modelling techniques are wellestablished. The problems with these conven-tional modelling approaches are: they have no pre-cise semantics and no verification support; theyare not machine comprehensible and as a conse-quence automated reasoning on the conceptual di-agrams is not possible. Previous approaches usedlogic to formally represent the diagrams and toovercome these problems. The description logic(DL) ALCQI is well suited to do reasoning withentity relationship diagrams (Lutz, 2002), UML

class diagrams (Berardi et al., 2005) and ORM di-agrams (Franconi et al., 2012). The DL ALCQIis an extension of the basic propositionally closeddescription logic AL and includes complex con-cept negation, qualified number restriction, and in-verse role.

Table 1 shows the constructs of the ALCQIdescription logic with suitable examples. It isreported that finite model reasoning with AL-CQI is decidable and ExpTime-complete1. Us-ing logic to formally represent the conceptual dia-grams introduces some problems too. For exam-ple, it is difficult to generate logical representa-tions, in particular for domain experts; it is alsodifficult for them to understand these representa-tions and no well established methodologies areavailable to represent the conceptual models for-mally. A solution to these problems is to usea RNL for the specification of conceptual mod-els. There exist several ontology editing andauthoring tools such as AceWiki (Kuhn, 2008),CLOnE (Funk et al., 2007), RoundTrip Ontol-ogy Authoring (Davis et al., 2008), Rabbit (De-naux et al., 2009), Owl Simplified English (Power,2012) that already use RNL for the specification ofontologies; they translate a specification into a for-mal notation. There are also works on mappingformal notation into conceptual models (Brock-mans et al., 2004; Bagui, 2009).

3 Proposed Approach

Several approaches have been proposed to uselogic with conventional modelling techniques toverify the models and to get the semantics of thedomains. These approaches allow machines to un-derstand the models and thus support automatedreasoning. To overcome the disadvantages asso-ciated with these approaches, we propose to usean RNL as a language for specifying conceptualmodels. The benefits of an RNL are: 1. the lan-guage is easy to write and understand for domainexperts as it is a subset of a natural language, 2.the language gets its semantics via translation intoa formal notation, and 3. the resulting formal no-tation can be used further to generate conceptualmodels.

Unlike previous approaches, we propose towrite a specification of the conceptual model inRNL first and then translate this specification intodescription logic. Existing description logic rea-

1http://www.cs.man.ac.uk/∼ezolin/dl/

45

Page 3: Specifying Conceptual Models Using Restricted Natural Language · proach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted

Construct Syntax Exampleatomic concept C Studentatomic role P hasChildatomic negation ¬C ¬ Studentconjunction C uD Student u Teacher(unqual.) exist. restriction ∃R ∃ hasChilduniversal value restriction ∀R.C ∀ hasChild.Malefull negation ¬(C uD) ¬ (Student u Teacher)qual. cardinality restrictions ≥ nR.C ≥ 2 hasChild.Femaleinverse role p− ∃hasChild−.Teacher

Table 1: The DL ALCQI.

soners2,3 can be used to check the consistencyof the formal notation and after that desired con-ceptual models can be generated from this nota-tion. Our approach is to derive the conceptualmodel from the specification whereas in conven-tional approaches knowledge engineers first drawthe model and then use programs to translate themodel into a formal notation (Fillottrani et al.,2012). Figure 1 shows the proposed system ar-chitecture for conceptual modelling.

3.1 Scenario

Let’s consider an example scenario of a learningmanagement system for a university stated below:

A Learning Management System (LMS) keepstrack of the units the students do during theirundergraduate or graduate studies at a particularuniversity. The university offers a number ofprograms and each program consists of a numberof units. Each program has a program name anda program id. Each unit has a unit code anda unit name. A student can take a number ofunits whereas a unit has a number of students. Astudent must study at least one unit and at mostfour units. Every student can enrol into exactlyone program. The system stores a student id anda student name for each student.

We reconstruct this scenario in RNL and af-ter that the language processor translates the RNLspecification into description logic using a feature-based phrase structure grammar (Bird et al., 2009).Our RNL consists of function words and contentwords. Function words (e.g., determiners, quanti-fiers and operators) describe the structure of the

2https://franz.com/agraph/racer/3http://www.hermit-reasoner.com/

RNL and their number is fixed. Content words(e.g, nouns and verbs) are domain specific and canbe added to the lexicon during the writing process.The reconstruction process of this scenario in RNLis supported by a look-ahead text editor (Guy andSchwitter, 2017). The reconstructed scenario inRNL looks as follows:

(a) No student is a unit.

(b) No student is a program.

(c) Every student is enrolled in exactly one pro-gram.

(d) Every student studies at least one unit and atmost four units.

(e) Every student has a student id and has a stu-dent name.

(f) No program is a unit and is a student.

(g) Every program is composed of a unit.

(h) Every program is enrolled by a student.

(i) Every program has a program id and has aprogram name.

(j) No unit is a student and is a program.

(k) Every unit is studied by a student.

(l) Every unit belongs to a program.

(m) Every unit has a unit code and has a unitname.

Additionally, we use the following terminologi-cal statements expressed in RNL:

(n) The verb studies is the inverse of the verbstudied by.

46

Page 4: Specifying Conceptual Models Using Restricted Natural Language · proach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted

Figure 1: Proposed system architecture for conceptual modelling using restricted natural language.

(o) The verb composed of is the inverse of theverb belongs to.

3.2 Grammar

A feature-based phrase structure grammar hasbeen built using the NLTK (Loper and Bird, 2002)toolkit to parse the above-mentioned specification.The resulting parse trees for these sentences arethen translated by the language processor into theirequivalent description logic statements. Below weshow a scaffolding of the grammar rules with fea-ture structures that we used for our case study:

S ->NP[NUM=?n, FNC=subj]VP[NUM=?n]FS

VP[NUM=?n] ->V[NUM=?n] NP[FNC=obj] |V[NUM=?n] Neg NP[NUM=?n, FNC=obj] |VP[NUM=?n] CC VP[NUM=?n]

NP[NUM=?n, FNC=subj] ->UQ[NUM=?n] N[NUM=?n] |NQ[NUM=?n] N[NUM=?n] |Det[NUM=?n] N[NUM=?n] |KP[NUM=?n] VB[NUM=?n] |KP[NUM=?n] VBN[NUM=?n]

NP[NUM=?n, FNC=obj] ->Det[NUM=?n] N[NUM=?n] |RB[NUM=?n] CD[NUM=?n] N[NUM=?n] |KP[NUM=?n] VB[NUM=?n] |KP[NUM=?n] VBN[NUM=?n]

V[NUM=?n] ->Copula[NUM=?n] |VB[NUM=?n] |Copula[NUM=?n] VBN[NUM=?n] |

Copula[NUM=?n] JJ[NUM=?n]

VB[NUM=pl] -> "study" | ...VB[NUM=sg] -> "studies" | ...VBN -> "studied" "by" | ...Copula[NUM=sg] -> "is"Copula[NUM=pl] -> "are"

JJ -> "inverse" "of"CC -> "and" | "or"

Det[NUM=sg] -> "A" | "a" | ...Det -> "The" | "the"

UQ[NUM=sg] -> "Every"NQ -> "No"

Neg -> "not"

N[NUM=sg] -> "student" | ...N[NUM=pl] -> "students" | ...

RB -> "exactly" | ...

CD[NUM=sg] -> "one"CD[NUM=pl] -> "two" | ... | "four"

KP -> "The" "verb" | ...

FS -> "."

In order to translate the resulting syntax treesinto the description logic representation, we haveused the owl/xml syntax4 of Web Ontology Lan-guage (OWL) as the formal target notation.

3.3 Case Study

The translation process starts by reconstructing thespecification in RNL that follows the rules of the

4https://www.w3.org/TR/owl-xmlsyntax/

47

Page 5: Specifying Conceptual Models Using Restricted Natural Language · proach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted

feature based grammar. While writing the specifi-cation in RNL, we tried to use the same vocabularyas in the natural language description.

The first two sentences of our specification usea negative quantifier in subject position and an in-definite determiner in object position:

(a) No student is a unit.

(b) No student is a program.

The translation of the sentence (a) into owl/xmlnotation results in the declaration of two atomicclasses student and unit which are disjointfrom each other.

<Declaration><Class IRI="\#student"/>

</Declaration><Declaration>

<Class IRI="\#unit"/></Declaration><DisjointClasses>

<Class IRI="\#student"/><Class IRI="\#unit"/>

</DisjointClasses>

Similarly, the translation of the sentence (b)results in the declaration of two disjoint atomicclasses student and program. Both sentences(a) and (b) are related to expressing atomic nega-tion in the DL ALCQI (see table 1).

<Declaration><Class IRI="\#student"/>

</Declaration><Declaration>

<Class IRI="\#program"/></Declaration><DisjointClasses>

<Class IRI="\#student"/><Class IRI="\#program"/>

</DisjointClasses>

The RNL that we have designed for this casestudy allows for verb phrase coordination; for ex-ample, the two above-mentioned sentences (a+b)can be combined in the following way:

(a+b) No student is a unit and is a program.

The translation of this sentence (a+b) results inthe declaration of three atomic classes student,unit and program where student is disjointfrom both unit and program.

<Declaration><Class IRI="\#student"/>

</Declaration><Declaration>

<Class IRI="\#unit"/></Declaration><Declaration>

<Class IRI="\#program"/>

</Declaration><DisjointClasses>

<Class IRI="\#student"/><Class IRI="\#unit"/>

</DisjointClasses><DisjointClasses>

<Class IRI="\#student"/><Class IRI="\#program"/>

</DisjointClasses>

Now let us consider the following RNL sen-tences that use a universal quantifier in subject po-sition and a quantifying expression in object posi-tion:

(c) Every student is enrolled in exactly one pro-gram.

(d) Every student studies at least one unit and atmost four units.

The universally quantified sentence (c) whichcontains a cardinality quantifier in the objectposition is translated into an object propertyenrolled in that has the class student asdomain and the class program as range with anexact cardinality of 1. This corresponds to a qual-ified cardinality restriction in the DL ALCQI.

<Declaration><ObjectProperty IRI="#enrolled_in"/>

</Declaration><ObjectPropertyDomain>

<ObjectProperty IRI="#enrolled_in"/><Class IRI="#student"/>

</ObjectPropertyDomain><ObjectPropertyRange>

<ObjectProperty IRI="#enrolled_in"/><ObjectExactCardinality cardinality="1">

<ObjectProperty IRI="#enrolled_in"/><Class IRI="#program"/>

</ObjectExactCardinality></ObjectPropertyRange>

The universally quantified sentence (d) whichhas a compound cardinality quantifier in object po-sition is translated into the object property studythat has the class student as domain and theclass unit as range with a minimum cardinalityof 1 and maximum cardinality of 4. The trans-lation of this sentence corresponds to a qualifiedcardinality restriction in the DL ALCQI.

<Declaration><ObjectProperty IRI="#study"/>

</Declaration><ObjectPropertyDomain><ObjectProperty IRI="#study"/><Class IRI="#student"/>

</ObjectPropertyDomain><ObjectPropertyRange><ObjectProperty IRI="#study"/><ObjectMinCardinality cardinality="1"><ObjectProperty IRI="#study"/>

48

Page 6: Specifying Conceptual Models Using Restricted Natural Language · proach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted

<Class IRI="#unit"/></ObjectMinCardinality>

</ObjectPropertyRange><ObjectPropertyRange>

<ObjectProperty IRI="#study"/><ObjectMaxCardinality cardinality="4"><ObjectProperty IRI="#study"/><Class IRI="#unit"/></ObjectMaxCardinality>

</ObjectPropertyRange>

The following RNL sentence has a universalquantifier in subject position and a coordinatedverb phrase with indefinite noun phrases in objectposition:

(e) Every student has a student id and has a stu-dent name.

The translation of this sentence (e) results intwo data properties for the class student. Thefirst data property is has student id with adata type integer and the second data prop-erty is has student name with the data typestring:

<Declaration><DataProperty IRI="\#has_student_id"/>

</Declaration><DataPropertyDomain>

<DataProperty IRI="\#has_student_id"/><Class IRI="#student"/>

</DataPropertyDomain><DataPropertyRange>

<DataProperty IRI="\#has_student_id"/><Datatype abbreviatedIRI="xsd:integer"/>

</DataPropertyRange><Declaration>

<DataProperty IRI="\#has_student_name"/></Declaration><DataPropertyDomain>

<DataProperty IRI="\#has_student_name"/><Class IRI="#student"/>

</DataPropertyDomain><DataPropertyRange>

<DataProperty IRI="\#has_student_name"/><Datatype abbreviatedIRI="xsd:string"/>

</DataPropertyRange>

(f) Every program is composed of a unit.

The universally quantified sentence (f) whichcontains an indefinite determiner in the objectposition is translated into the object propertycomposed of with the class program as do-main and the class unit as range. This is corre-sponds to an unqualified existential restriction inthe DL ALCQI.

<Declaration><ObjectProperty IRI="#composed_of"/>

</Declaration><ObjectPropertyDomain>

<ObjectProperty IRI="#composed_of"/>

<Class IRI="#program"/></ObjectPropertyDomain><ObjectPropertyRange>

<ObjectProperty IRI="#composed_of"/><ObjectSomeValuesFrom>

<ObjectProperty IRI="#composed_of"/><Class IRI="#unit"/>

</ObjectSomeValuesFrom></ObjectPropertyRange>

The following two RNL sentences have a defi-nite determiner both in subject position and objectposition and specify lexical knowledge for the lan-guage processor:

(g) The verb studies is the inverse of the verbstudied by.

(h) The verb composed of is the inverse of theverb belongs to.

The translation of these two sentences resultsin the specification of inverse object properties.The translation of sentence (g) leads to the objectproperties study and studied by which areinverse object properties. Similarly, the transla-tion of sentence (h) states that the object propertiescomposed of and belong to are also inverseobject properties. These statements correspond tothe inverse role construct in the DL ALCQI.

<InverseObjectProperties><ObjectProperty IRI="#study"/><ObjectProperty IRI="#studied_by"/>

</InverseObjectProperties><InverseObjectProperties>

<ObjectProperty IRI="#composed_of"/><ObjectProperty IRI="#belong_to"/>

</InverseObjectProperties>

The rest of the specification is similar to the ex-amples that we have discussed above.

4 Reasoning

After generating the owl/xml notation for the RNLspecification, we use Owlready (Lamy, 2017)that includes the description logic reasoner Her-miT (Glimm et al., 2014) for consistency check-ing. Owlready is a Python library for ontology-oriented programming that allows to load OWL2.0 ontologies and performs various reasoningtasks. For example, consistency checking of thespecification can be performed on the class level.If a domain expert writes for example ”No studentis a unit” and later specifies that ”Every unit is astudent”, then the reasoner can detect this incon-sistency and informs the domain expert about thisconflict. The owl/xml notation below shows how

49

Page 7: Specifying Conceptual Models Using Restricted Natural Language · proach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted

this inconsistency (”owl:Nothing”) is reported af-ter running the reasoner.<rdf:Description

rdf:about="http://www.w3.org/2002/07/owl#Nothing"><owl:equivalentClassrdf:resource=

"http://www.w3.org/2002/07/owl#Nothing"/><owl:equivalentClass rdf:resource="#student"/><owl:equivalentClass rdf:resource="#unit"/>

</rdf:Description>

This inconsistency can be highlighted directlyin the RNL specification; that means the domainexpert can fix the textual specification and doesnot have to worry about the underlying formal no-tation.

5 Conceptual Model Generation

In the next step, we extract necessary informa-tion such as a list of entities (classes), attributes(data properties) and relationships (object proper-ties) from the owl/xml file to generate the con-ceptual model. This information is extracted byexecuting XPath (Berglund et al., 2003)5 queriesover the owl/xml notation and then it is used tobuild a database schema containing a number oftables representing the entities with associated at-tributes. Relationships among the entities are rep-resented by using foreign keys in the tables. AnSQL script is generated containing SQLite com-mands6 for this database schema.

This SQL script is executed by using SQLite togenerate the corresponding database for the spec-ification. After that, we use SchemaCrawler7 togenerate the entity relationship diagram (see fig.2) from the SQL script. SchemaCrawler is afree database schema discovery and comprehen-sion tool that allows to generate diagrams fromSQL code.

For mapping a description logic representationto an entity relationship diagram, we have usedthe approach described by Algorithm 1. All theclasses in the OWL file become entities in the ER-diagram. Object properties are mapped into rela-tions between the entities and data properties aremapped into attributes for these entities. The qual-ified cardinality restrictions of the object prop-erties define relationship cardinalities in the dia-gram.

We understand conceptual modelling as a roundtripping process. That means a domain expert can

5https://www.w3schools.com/xml/xml xpath.asp6https://www.sqlite.org/index.html7https://www.schemacrawler.com/

Algorithm 1: Mapping description logic rep-resentation to SQLite commands for generat-ing entity relationship diagrams.

Input: Logical notation in description logicOutput: SQLite scriptentity list= extract class(owl/xml file);data property list=extract data property(owl/xml file);object property list=extract object property(owl/xml file);

for enity in entity list docreate table(entity)for data property indata property list do

add data property(entity,data property)

for object property inobject property list do

if cardinality == 1 thenadd data property(entity,data property)

endend

endfor object property inobject property list do

create relationship(entity,object property)

endend

write the RNL specification first, then generate theconceptual model from the specification, and thena knowledge engineer might want to modify theconceptual model. These modifications will thenbe reflected on the level of the RNL by verbalis-ing the formal notation. During this modificationprocess the reasoner can be used to identify in-consistencies found in a given specification and togive appropriate feedback to the knowledge engi-neer on the graphical level or to the domain experton the textual level.

6 Discussion

The outcome of our experiment justifies the pro-posed approach for conceptual modelling. Wehave used a phase structure grammar to convert aRNL specification into description logic. This ex-periment shows that it is possible to generate for-mal representations from RNL specifications and

50

Page 8: Specifying Conceptual Models Using Restricted Natural Language · proach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted

Figure 2: Entity relationship diagram generated from the formal representation using SchemaCrawler.

these formal representations can be mapped to dif-ferent conceptual models. The proposed approachfor conceptual modelling addresses two researchchallenges8: 1. providing the right set of mod-elling constructs at the right level of abstractionto enable successful communication among thestakeholders (i.e. domain experts, knowledge en-gineers, and application programmers); 2. pre-serving the ease of communication and enablingthe generation of a database schema which is a partof the application software.

7 Future Work

We are planning to develop a fully-fledged re-stricted natural language for conceptual modellingof information systems. We want to use this lan-guage as a specification language that will helpthe domain experts to write the system require-ments precisely. We are also planning to developa conceptual modelling framework that will allowusers to write specifications in RNL and will gen-erate conceptual models from the specification.This tool will also facilitate the verbalization ofthe conceptual models and allow users to manipu-late the models in a round tripping fashion (fromspecification to conceptual models and conceptualmodels to specifications). This approach has sev-eral advantages for the conceptual modelling pro-cess: Firstly, it will use a common formal repre-sentation to generate different conceptual models.Secondly, it will make the conceptual modellingprocess easy to understand by providing a frame-work to write specifications, generate visualiza-tions, and verbalizations. Thirdly, it is machine-processable like other logical approaches and sup-port verification; furthermore, verbalization will

8http://www.conceptualmodeling.org/ConceptualModeling.html

facilitate better understanding of the modellingprocess which is only available in limited formsin the current conceptual modelling frameworks.

8 Conclusion

In this paper we demonstrated that an RNL canserve as a high-level specification language forconceptual modelling, in particular for specifyingentity-relationship models. We described an ex-periment that shows how we can support the pro-posed modelling approach. We translated a spec-ification of a conceptual model written in RNLinto an executable description logic program thatis used to generate the entity-relationship model.Our RNL is supported by automatic consistencychecking, and is therefore very suitable for for-malizing and verifying conceptual models. Thepresented approach is not limited to a particularmodeling framework and can be used apart fromentity-relationship models also for object-orientedmodels and object role models. Our approach hasthe potential to bridge the gap between a seem-ingly informal specification and a formal represen-tation in the domain of conceptual modelling.

ReferencesSikha Bagui. 2009. Mapping OWL to the entity re-

lationship and extended entity relationship models.International Journal of Knowledge and Web Intel-ligence 1(1-2):125–149.

Daniela Berardi, Diego Calvanese, and Giuseppe DeGiacomo. 2005. Reasoning on uml class diagrams.Artificial Intelligence 168(1):70–118.

Anders Berglund, Scott Boag, Don Chamberlin,Mary F Fernandez, Michael Kay, Jonathan Ro-bie, and Jerome Simeon. 2003. XML Path Lan-guage (XPath). World Wide Web Consortium (W3C)https://www.w3.org/TR/xpath20/.

51

Page 9: Specifying Conceptual Models Using Restricted Natural Language · proach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted

Peter Bernus, Kai Mertins, and Gunter Schmidt. 2013.Handbook on architectures of information systems.Springer Science & Business Media.

Steven Bird, Ewan Klein, and Edward Loper. 2009.Publications received. Computational Linguistics36:283–284.

Sara Brockmans, Raphael Volz, Andreas Eberhart, andPeter Loffler. 2004. Visual modeling of owl dl on-tologies using uml. In International Semantic WebConference. Springer, pages 198–213.

Diego Calvanese. 2013. Description Logics for Con-ceptual Modeling Forms of reasoning on UML ClassDiagrams. EPCL Basic Training Camp 2012-2013.

Brian Davis, Ahmad Ali Iqbal, Adam Funk, ValentinTablan, Kalina Bontcheva, Hamish Cunningham,and Siegfried Handschuh. 2008. Roundtrip ontol-ogy authoring. In International Semantic Web Con-ference. Springer, pages 50–65.

Ronald Denaux, Vania Dimitrova, Anthony G Cohn,Catherine Dolbear, and Glen Hart. 2009. Rabbit toowl: ontology authoring with a cnl-based tool. InInternational Workshop on Controlled Natural Lan-guage. Springer, pages 246–264.

Pablo R Fillottrani, Enrico Franconi, and Sergio Tes-saris. 2012. The icom 3.0 intelligent conceptualmodelling tool and methodology. Semantic Web3(3):293–306.

Enrico Franconi, Alessandro Mosca, and Dmitry Solo-makhin. 2012. Orm2: formalisation and encodingin owl2. In OTM Confederated International Con-ferences” On the Move to Meaningful Internet Sys-tems”. Springer, pages 368–378.

Joseph Frantiska. 2018. Entity-relationship diagrams.In Visualization Tools for Learning Environment De-velopment, Springer, pages 21–30.

Adam Funk, Valentin Tablan, Kalina Bontcheva,Hamish Cunningham, Brian Davis, and SiegfriedHandschuh. 2007. Clone: Controlled language forontology editing. In The Semantic Web, Springer,pages 142–155.

Birte Glimm, Ian Horrocks, Boris Motik, Giorgos Stoi-los, and Zhe Wang. 2014. HermiT: An OWL 2 Rea-soner. Journal of Automated Reasoning 53(3):245–269.

Stephen C Guy and Rolf Schwitter. 2017. The peng aspsystem: architecture, language and authoring tool.Language Resources and Evaluation 51(1):67–92.

Terry Halpin. 2009. Object-role modeling. In Encyclo-pedia of Database Systems, Springer, pages 1941–1946.

Tobias Kuhn. 2008. Acewiki: A natural and expressivesemantic wiki. arXiv preprint arXiv:0807.4618.

Tobias Kuhn. 2014. A survey and classification of con-trolled natural languages. Computational Linguis-tics 40(1):121–170.

Jean-Baptiste Lamy. 2017. Owlready: Ontology-oriented programming in python with automaticclassification and high level constructs for biomed-ical ontologies. Artificial intelligence in medicine80:11–28.

Edward Loper and Steven Bird. 2002. Nltk: The natu-ral language toolkit. In Proceedings of the ACL-02Workshop on Effective tools and methodologies forteaching natural language processing and computa-tional linguistics-Volume 1. Association for Compu-tational Linguistics, pages 63–70.

Carsten Lutz. 2002. Reasoning about entity relation-ship diagrams with complex attribute dependencies.In Proceedings of the International Workshop inDescription Logics 2002 (DL2002), number 53 inCEUR-WS (http://ceur-ws.org). pages 185–194.

Gerard O´ Regan. 2017. Unified modelling language.In Concise Guide to Software Engineering, Springer,pages 225–238.

Antoni Olive. 2007. Conceptual Modeling of Informa-tion Systems. Springer-Verlag, Berlin, Heidelberg.

Richard Power. 2012. Owl simplified english: a finite-state language for ontology editing. In Interna-tional Workshop on Controlled Natural Language.Springer, pages 44–60.

Barker Richard. 1990. CASE Method: Entity Relation-ship Modelling. Addition-Wesley Publishing Com-pany, ORACLE Corporation UK Limited.

Rolf Schwitter. 2010. Controlled natural languages forknowledge representation. In Proceedings of the23rd International Conference on ComputationalLinguistics: Posters. Association for ComputationalLinguistics, pages 1113–1121.

52