Kontrast@TKE 2012

Post on 17-May-2015

313 views 0 download

Tags:

description

This paper is part of a study focusing on the terminological and socio-organizational analysis of a corpus of 18 national and international standards, written in English, in the domains of business continuity activity management and risk management. The aim is to determine whether lobbying by certain countries seeking to impose their own national standards is a decisive element in standardization. First, we present the building of a new tool, called KONTRAST, designed to exploit the terminological variants in a non-stabilized terminological domain. Then we describe the workflow to build an RDF/SKOS/OWL base from an XML glossary and a use case to illustrate the ability of KONTRAST to detect influence networks.

Transcript of Kontrast@TKE 2012

KONTRASTTKE 2012

Brigitte JuanalsMartin Lafréchoux

Jean-Luc Minel

Hi.

I am Martin Lafréchoux.

Photo : http://www.flickr.com/photos/daynoir/2180507211/

Standardization and Global Security

The work I am about to present was part a three-year project called NOTSEG (www.notseg.fr, ANR-CSOG 2009).

We acknowledge funding from the French National Research Agency (ANR)

KONTRAST is a termino-ontological resource designed to represent and analyze the vocabulary used in international management standards

In short...

Photo : http://www.flickr.com/photos/natureindyablogspotcom/3038070680

I. ContextII. ModelIII. Use Case

I will briefly describe the context of our work, i.e. the challenges faced by terminology in management standards. I’ll then describe the representation model we came up with to address these challenges, as well as our workflow. The last part of this talk will show you a quick use case.

Photo : http://www.flickr.com/photos/natureindyablogspotcom/3038070680

I. Context

Standards & Terminology

Kontrast was designed to address the specific issues of terminology in the context management standards.

Let me begin by giving you an overview of these specificities.

StandardsStandards can refer to many things.

The first thing that comes to your minds is probably something very down to earth, like power outlets or the size of vegetables.

That’s not what I’ll be talking about today. I’ll talk about management standards, and more precisely I’ll talk about business continuity.

Illustration : http://www.flickr.com/photos/double-m2/4341910416/

Business Continuity: The activity performed by an organization to ensure that critical functions will be available to those who need them even in the event of a disaster.

Just a quick reminder.

Business continuity is a subtask of risk management.

• Management standards are not about physical objects

• They deal with the abstract — processes, business rules, concepts, methods

• Standards include a ‘Terms and Definitions’ (T&D) section to alleviate ambiguity

Generally speaking, management standard use natural language to standardize an abstract material: rules, processes, and so on.

Given that there is nothing tangible, concrete to refer to, only words and abstractions, these standards have to include some manner of terminology. That’s the use of their T&D section.

This is what we studied.

Illustration : http://www.flickr.com/photos/double-m2/4341910416/

Here you can see the beginning of the T&D of ISO 31000:2009. It’s a glossary, either thematic or alphabetical.

We chose to study T&D because we witnessed some heated discussion about them during the writing process of certain ISO standards. They seemed to be some kind of focal point where we could observe the different influences at play. This was confirmed by the AFNOR experts we worked with: it’s not easy to agree on a terminology when writing international standards.

Consensus

If I oversimplify things a bit, the international standardization process goes something like this:

Illustration : http://www.flickr.com/photos/double-m2/4324115629/

• Each standards organization can define its own vocabulary

• International standards have to choose between several concurrent vocabularies

• The writing process follows a so-called ‘consensual procedure’.

On a given topic, several countries write their own national standard. Each one comes with its own T&D.

When ISO decides to write an international standard on this topic, these countries send delegations and of course, each country wants his own terminology to prevail, as it would give a competitive advantage to those who have already adopted its national standard.

There is this competition between several vocabularies, none of which can be seen as more valid as the others.

Illustration : http://www.flickr.com/photos/double-m2/4324115629/

Experts

This is where the ‘experts’ come in.

Experts is a generic term to describe the people sent by each organisation and country to ISO. They are mostly consultants or from corporate background.

When writing a standard, a so-called ‘consensual procedure’ is used, where one expert (secretary) will review each definition proposal and every other expert can ask for modifications. There is no formal definition of this procedure.

Illustration : http://www.flickr.com/photos/beatnic/3683822225/

• The way a standard will be implemented depends largely on its T&D

• T&D are a product of the notional systems of the experts who wrote them

• T&D are an economical, sometimes political issue

Since the way a standard will be implemented depends largely on its T&D, the T&D can become economical and political issues. The hypothesis we are trying to verify is this: can the power plays and maneuvers that took place when writing international standards be traced in their T&D?

Illustration : http://www.flickr.com/photos/beatnic/3683822225/

Authority?

International standardization has one major specificity compared to other terminology-heavy fields - like, say, industry. No one has authority.

Illustration : http://www.flickr.com/photos/double-m2/4324611290/

• ISO has no authority over national standardization organisations

• ISO vocabularies do not replace nor supercede other terminologies

• ISO itself is not monolithic. Different subgroups coexist among ISO.

Standards are not laws. They are only references. An organization is free to use a standard or not.

In the same way, ISO has no authority over national organisations.

“resilience”

“The ability of an organization to resist being affected by an incident”

“The adaptive capacity of an organization in a complex and changing environment”

ISO DIS 22300:2011ISO/IEC 27031:2011

For example here are two definitions of the word ‘resilience’ in two ISO standards written in 2011. Resilience is a key concept in business continuity.

But more on that later.

Borrowings & ReferencesAs I said, each country *can* create its own vocabulary. That does not mean that they always do.

Photo : http://www.flickr.com/photos/linneberg/6976347269/

• Creating a new terminology is a tedious and costly process

• Standards frequently recycle or reuse other definitions

• These quotes and reuses sometimes go unacknowledged

As I am sure you are all aware, creating a terminology can be a long, daunting, costly process.

Often a standard will reuse the T&D of an existing standard, in part or in total, or at least refer to it. These quotes and borrowings create a complex network, which is what we want our system to track.

Photo : http://www.flickr.com/photos/linneberg/6976347269/

Reuse

The most simple case is a straight reuse. In effect it is similar to ‘importing’ a library in a programming language.

Source : ISO DIS 22301:2010

Quote

When a standard quotes a single definition, the ID of the original standard is in brackets.

Modified Quote

Some quotes are shortened or modified.

Quote?

Some even go unacknowledged, which is, of course, of particular interest to us.

Guide 81 vs. BS 25999 1 - impact

Reference

Some definitions also refer to another one.

• A wide range of terminological systems are coexisting

• They are interlinked by a complex network of influence, borrowings, adaptations and reuses

• How can we represent them simultaneously without alignement?

Photo : http://www.flickr.com/photos/moofbong/4240137966/

II. ModelHere is the termino-ontological model we designed to address these issues.

Photo : http://www.flickr.com/photos/esm723/3573226450/

A contrastive ontological glossary

As you will see, it is not a ‘proper’ ontoterminology, so we call it a contrastive ontological glossary instead.

Photo : http://www.flickr.com/photos/esm723/3573226450/

•A 2 part-model :

- Terminological Data

- Structural Data

It’s a 2-part structure.

Photo : http://www.flickr.com/photos/esm723/3573226450/

Terminological DataPhoto : http://www.flickr.com/photos/fijneman/2971217479/

•Terminology in standards offers unique challenges to knowledge engineering

•How can several semi-identical concepts coexist?

•We built on the operational properties of OWL ontologies with a twist: an unorthodox definition of the ‘concept’

The main challenge is that we had to represent several conflicting concepts at the same time, without alignment and without any central authority to choose the ‘right one’.

In Kontrast a ‘concept’ is the relationship between a term, a context of use and a definition.

“An unstable condition involving an impending abrupt or significant change...”@en

Définition

This allowed us to designed a scattered, decentralised model, where the individual representing concepts are nothing more than the reification of a ternary relationship between a term, a standard and a definition.

Please allow me to reiterate: this was a technical design decision.

One of the first advantages of this definition was that it worked well with the concept as defined by SKOS.

SKOS is designed to represent several thesaurus that may share terms and / or definitions, which worked well for us.

Illustration : http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/

Ontology BuildingHere is how we worked.

•A linear XML glossary is converted to RDF/OWL

•XSLT perform most of the work

•Python scripts extract relationships between concepts

In a different part of the NOTSEG project, a glossary was manually compiled from the 20 standards of our corpus. We then applied automatic and manual treatments over this glossary.

I’ll walk you over the different steps.

Here is an entry of the XML glossary.

31000:2009

Here is a diagram representing the same entry in Kontrast.

First the term.

31000:2009

It is used in several places in the ontology: as a term (left), as part of the ID for the concept (center), and as a label (right).

Definition

31000:2009

Transposed as a concept property

The standards in which the concept appears...

31000:2009

... are represented as skos:conceptSchemes, which makes them independent thesaurus.

Relationship ExtractionNext step is extracting relationships

“vulnerability”

ISO Guide 73:2009

intrinsic properties of something resulting in susceptibility to a risk source (3.5.1.2) that can lead to an event with a consequence (3.6.1.3)

AS/NZS 5050:2010

Intrinsic properties of something resulting in susceptibility to a risk source that can lead to an event with a consequence. [ISO Guide 73:2009, Risk Management—Vocabulary, definition 3.6.1.6]

ASIS SPC.1:2009

Intrinsic properties of something that create susceptibility to a source of risk (3.53) that can lead to a consequence. [ISO/IEC Guide 73:2002]

ISO DIS 22300:2011

intrinsic properties of something resulting in susceptibility to a risk source that can lead to an event with a consequence

Another example.

Here are four definitions of ʻvulnerabilityʼ.

“vulnerability”

ISO Guide 73:2009

intrinsic properties of something resulting in susceptibility to a risk source (3.5.1.2) that can lead to an event with a consequence (3.6.1.3)

AS/NZS 5050:2010

Intrinsic properties of something resulting in susceptibility to a risk source that can lead to an event with a consequence. [ISO Guide 73:2009, Risk Management—Vocabulary, definition 3.6.1.6]

ASIS SPC.1:2009

Intrinsic properties of something that create susceptibility to a source of risk (3.53) that can lead to a consequence. [ISO/IEC Guide 73:2002]

ISO DIS 22300:2011

intrinsic properties of something resulting in susceptibility to a risk source that can lead to an event with a consequence

To compare them, we normalize the text — i.e. get rid of everything in brackets, of punctuation and capital letters.

“vulnerability”

ISO Guide 73:2009

intrinsic properties of something resulting in susceptibility to a risk source (3.5.1.2) that can lead to an event with a consequence (3.6.1.3)

AS/NZS 5050:2010

Intrinsic properties of something resulting in susceptibility to a risk source that can lead to an event with a consequence. [ISO Guide 73:2009, Risk Management—Vocabulary, definition 3.6.1.6]

ASIS SPC.1:2009

Intrinsic properties of something that create susceptibility to a source of risk (3.53) that can lead to a consequence. [ISO/IEC Guide 73:2002]

ISO DIS 22300:2011

intrinsic properties of something resulting in susceptibility to a risk source that can lead to an event with a consequence

Three of them match: we create a skos:exactMatch relationship.

“vulnerability”

ISO Guide 73:2009

intrinsic properties of something resulting in susceptibility to a risk source (3.5.1.2) that can lead to an event with a consequence (3.6.1.3)

AS/NZS 5050:2010

Intrinsic properties of something resulting in susceptibility to a risk source that can lead to an event with a consequence. [ISO Guide 73:2009, Risk Management—Vocabulary, definition 3.6.1.6]

ASIS SPC.1:2009

Intrinsic properties of something that create susceptibility to a source of risk (3.53) that can lead to a consequence. [ISO/IEC Guide 73:2002]

ISO DIS 22300:2011

intrinsic properties of something resulting in susceptibility to a risk source that can lead to an event with a consequence

The fourth one is slightly different. Itʼs very close, but we have no way to confirm it automatically. On such short texts, a variation of a few words can be huge. A human analysis is still needed.

The best we can do is to create a temporary file for humans to check afterwards.

exactMatch closeMatch relatedMatch

Recall

Precision

1.0 0.38 0.21

1.0 1.0 1.0

Our test are designed for maximum precision. They cannot fail. The obvious downside is that recall is very low.

•20 standards

•291 terms

•649 concepts

•1107 matching relationships• 486 skos:relatedMatch

• 85 skos:exactMatch

• 535 skos:closeMatch

Structural Data

Briefly, here is the other part of the resource.

Photo : http://www.flickr.com/photos/hindrik/1919291052/

Data about...

•the standards: release date, version, current status, reach...

•the standardization process: publishers, working groups, institutions...

It contains mostly metadata about the standars and the writing process.

Photo : http://www.flickr.com/photos/hindrik/1919291052/

This data is linked with the terminological data through the individuals representing standards.

We used Dublin Core (dcterms) whenever possible to ensure maximum interoperability.

Standards are revised regularly. Through the borrowings and quotes, an ‘older’ definition can remain in use even if a new version of the standard has been published. So we have to represent several versions of the same standard at the same time.

We also used dcterms.

Some standards of our corpus are present in DBPedia. We used owl:sameAs or dcterms:isPartOf assertions to connect Kontrast with the Linked Data.

•Decentralized, simple and extensible model

•Uses standard semantic web vocabularies

•Connected to the linked data

Photo : http://www.flickr.com/photos/sperkyajachtu/5497757852/

III. Use CasePhoto : http://www.flickr.com/photos/daynoir/2180507271/

‘resilience’Kontrast does not have its own GUI. We used third-party tools such as RDF Gravity or Ontograf to display a graphical representation of Kontrast.

“resilience”

“The ability of an organization to resist being affected by an incident”

“The adaptive capacity of an organization in a complex and changing environment”

ISO DIS 22300:2011ISO/IEC 27031:2011

Let’s get back to resilience.

“resilience”

Here are the concepts using the term resilience in Kontrast

In yellow, skos:closeMatches, in green skos:exactMatches, in brown skos:relatedMatches.

Capture : Ontograf (Protégé plug-in)

“resilience”

These three nodes are British standards — the two parts of the BS25999 standard, and the associated good practice guide.

“resilience”

“The ability of an organization to resist being

affected by an incident”

They use the same definition of ‘resilience’.

The UK has worked on emergy planning since the 1980’s. It has been a business continuity leader since then, and their standards have been used as international references for a long time.

It led the BSI to a prominent position within ISO for business continuity standards.

“resilience”

“The ability of an organization to resist being

affected by an incident”

ISO 27031 was recently published and uses the british definition.

“resilience”

On the other side of the graph are definitions under American influence.

“resilience”

Towars the center you can see the ASIS SPC.1:2009 definition.

The adaptive capacity of an organization in a complex and changing environment. - NOTE 1: Resilience is the ability of an organization to resist being affected by an event or the ability to return to an acceptable level of performance in an acceptable period of time after being affected by an event.

“resilience”

ASIS SPC.1:2009

«The adaptive capacity of an organization in a complex and changing environment.»

At first, the definition seems different. But if you take a closer look...

The adaptive capacity of an organization in a complex and changing environment. - NOTE 1: Resilience is the ability of an organization to resist being affected by an event or the ability to return to an acceptable level of performance in an acceptable period of time after being affected by an event.

“resilience”

ASIS SPC.1:2009

The first note reproduces the british definition. In 2009, when the ASIS standard was published, the british influence was still very strong and completely foregoing the british definition would have handicapped the standard.

“The adaptive capacity of an organization in a complex and changing

environment”

“resilience”

After the publication of this standard, the US used a different strategy. They kind of went around the british influence and pushed to have their definition of ‘resilience’ adopted in broader standards — the ISO 31000 series which deals with risk management.

With the help of an Israeli expert, the US managed to get their definition in the 2009 version of the ISO Guide 73 (rightmost node). This was a good move because the definitions of ISO Guide are often quoted or borrowed, as we can see at here at the bottom of the graph (the Australia / New Zealand standard).

“The adaptive capacity of an organization in a complex and changing environment, to achieve the organizations objectives NOTE 1 Resilience is the ability of an organization to manage the risks of events”

“resilience”

In 2011, the US pushed for a new definition to be adopted in the ISO 22300 series. This caused quite a stir, as the 22300 series is a purely BC standard series, where the british concepts usually prevail.

The debate is still ongoing.

“resilience”

And that’s how you end up with two conflicting definitions of an important concept in two standards written within the same organization, during the same year.

“resilience”

“The ability of an organization to resist being affected by an incident”

“The adaptive capacity of an organization in a complex and changing environment”

ISO DIS 22300:2011ISO/IEC 27031:2011

These definitions are important because they translate two different visions of business continuity: in short, the US / Israeli position is about the planning and reactive capacities of an organisation, whereas the british experts believe that risk assessment is costly and not very realistic, as it is impossible to anticipate all possible risks.

Photos : http://www.flickr.com/photos/pmillera4/6366227011/ & http://www.flickr.com/photos/adrianclarkmbbs/3050195566/

Conclusion

Photo : http://www.flickr.com/photos/bruceberrien/4262228892/

• Standardization terminology offers unique challenges to knowledge engineering

• Influence can be traced in terms and definitions and Kontrast can be a useful tool to assist human analysis

• Many possible optimizations: use Lemon instead of SKOS, automate relationship extraction...

SKOS was great for us because it was readily available and simple to set-up, but now we begin to feel its limits.

Lemon would allow us to push our idea further.

Photo : http://www.flickr.com/photos/bruceberrien/4262228892/

Thank you