Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

21
1 Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR Marcia Zeng NKOS (Networked Knowledge Organization Systems/Services) My participating in OOR: Introducing the work done by the NKOS folks

description

Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR. Marcia Zeng NKOS (Networked Knowledge Organization Systems/Services) My participating in OOR: Introducing the work done by the NKOS folks. About NKOS (Networked Knowledge Organization Systems/Services). - PowerPoint PPT Presentation

Transcript of Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

Page 1: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

1

Issues in Reusing and Sharing the Content of Thesauri and

Taxonomies in OOR

Marcia ZengNKOS

(Networked Knowledge Organization Systems/Services)

My participating in OOR: Introducing the work done by the NKOS folks

Page 2: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

2

About NKOS (Networked Knowledge Organization Systems/Services)

-- Informal network for enabling knowledge organization systems (KOS), such as classification systems, thesauri, gazetteers, ontologies and folksonomies as networked interactive information services to support the description and retrieval of diverse information resources through the Internet– Ongoing series of NKOS workshops

• JCDL (Joint Conference on Digital Libraries), US • ECDL (European Conference on Research and Advanced

Technology for Digital Libraries)• International Conference on Dublin Core and Metadata (DC)

– NKOS website http://nkos.slis.kent.edu/

Page 3: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

3

Standards• ANSI/NISO Z39.19 -2005 Guidelines for the Construction,

Format, and Management of Monolingual Controlled Vocabularies

• Forthcoming: ISO 25964 Structured Vocabularies for Information Retrieval (Based on published BS8723)– Part 1 – Definitions, symbols and abbreviations– Part 2 – Thesauri– Part 3 – Vocabularies other than thesauri – Part 4 – Interoperability between vocabularies– Part 5 – Interoperation between vocabularies and

other components of information storage and retrieval systems

Leader: Stella Dextre Clarke, Information Consultant, UK

Page 4: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

4

Terminology Registries and Services (1): HILT

Funded by UK Joint Information Systems Committee JISC

http://hilt.cdlr.strath.ac.uk/index.html

Dennis Nicholson, University of Strathclyde

Large structured vocabularies, each containing thousands of controlled terms/classes and the relationships among terms/classes.

4

Page 5: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

5

HILT phase I: mapping between schemes HILT phase II: terminologies server HILT phase III: M2M pilot demonstrator HILT phase IV: transition to service testbed and future requirements study

5

Page 6: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

6

Research: http://www.oclc.org/research/projects/termservices/

Service: http://www.oclc.org/terminologies/default.htm

OCLC Research Office: Diane Vizine-Goetz (Lead)

Terminology Registries and Services (2): OCLC Terminology Services

6

Page 7: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

7

Using MS Office Research Task Pane, provides 10 vocabularies for tagging,

searching, translation, etc.

7

Page 8: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

8

http://metadataregistry.org/vocabulary/list.html

Terminology Registries and Services (3): NSDL Registry

Funded by NSF NSDL Project

Aims: supporting registration of schemes and schemas; supporting the machine mapping of relationships among terms and concepts in those schemes and schemas.U.Washington: Stuart A. Sutton Cornell Univ: Diane Hillmann, Jon Phipps

8

Page 9: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

9

STAR website http://hypermedia.research.glam.ac.uk/kos/star/ SKOS terminology services http://hypermedia.research.glam.ac.uk/kos/terminology_services/

Terminology Registries and Services (4): STARSemantic Technologies for Archaeological Resources (2007-2010) •Funded by AHRC (Arts & Humanities Research Council)•Doug Tudhope University of Glamorgan• Aims to develop new methods for linking digital archive databases, vocabularies and the associated grey literature, exploiting the potential of a high level, core ontology and natural language processing techniques.

Page 10: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

10

Terminology Registries and Services (5): TRSS

• Analyzes issues related to the potential delivery of a Terminology Registry as a shared infrastructure service within the JISC (UK Joint Information Systems Committee) Information Environment.   

• Focuses more on KOS registry but would wish to maintain compatibility with more formal AI ontology registries to the extent practical without imposing excessive overheads.

 http://www.ukoln.ac.uk/projects/trss/

Lead InstitutionUKOLN at the University of Bath Project partnerUniversity of Glamorgan, Hypermedia Research Unit and OCLC Office of Research, USA

Page 11: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

11

Discussions: Reusing thesauri and taxonomies for ontologies

KOS: large, structured vocabularies; good representations for domain knowledge.

Basic functional requirement:• eliminating ambiguity• controlling synonyms• establishing relationships among terms where

appropriate• testing and validation of terms

Page 12: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

12

Issues (1)

1. Thesauri and taxonomies may have looser control for hierarchical relationships.

Terms are selected based on• literary warrant: the natural language used

to describe content objects

• user warrant: the language of users

• organizational warrant: the needs and priorities of the organization

-- i.e., not consistently based on logic, not always highly structured.

Page 13: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

13

Issues (2)2. Common interchange format is SKOS

13-- i.e., not OWL.

Page 14: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

14

[Borrowing an example from SWED]:SWED is a prototype of an environmental organizations and projects directory

14Source: http://www.swed.org.uk/swed/index.html

Page 15: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

15Source: Alistair Miles, Taxonomies and the Semantic Web, CISTRANA Workshop 02/06http://isegserv.itd.rl.ac.uk/public/skos/press/cistrana200602/taxonomies-semanticweb.ppt

[from SWED]:

Page 16: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

16

• cost/benefit tradeoffs involved in investing in semantics …

Source: Alistair Miles, Taxonomies and the Semantic Web, CISTRANA Workshop 02/06http://isegserv.itd.rl.ac.uk/public/skos/press/cistrana200602/taxonomies-semanticweb.ppt

[from SWED]:

Page 17: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

17

Issues (3)

3. Concept mapping

• Current various ontology registries/repositories and terminology services usually do not provide concept-based mapping.   

• E,g., searching “aging” in a large ontology repository, we got classes like “biological imaging methods”, “Imaging device”, “lavaging”, etc.

Note: the actual screen shot is omitted here to protect the reputation of the registry

Page 18: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

18

Issues (4)

4. Multilingual and multi-cultural issues in the mapping process

-- non-English schemes

-- non-symmetrical schemes– [Based on my experience in building an conceptual framework

for Complementary and Alternative Medicine (CAM)]

Page 19: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

19

Summary (1)

• Some well-established thesauri and taxonomies should be available for reuse by content developers.

• Terminology services may have experienced processes that OOR may encounter.

• Issues in reusing and sharing the content of thesauri and taxonomies in OOR include granularity, structure, encoding, etc. These need OOR to have policies, strategies, and enabling tools.

• Concept-based mapping will be a major need and will also bring many issues.

Page 20: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

20

Summary (2)

• Questions to OOR:– What representation formats will OOR allow? – Will OOR provide access to individual

ontology elements?  – Does OOR see itself as providing (web)

services in addition to providing access to discover and download ontologies?

Page 21: Issues in Reusing and Sharing the Content of Thesauri and Taxonomies in OOR

21

References• NKOS website http://nkos.slis.kent.edu/• HILT (High Level Thesaurus) project http://hilt.cdlr.strath.ac.uk/index.html• OCLC Terminologies Service

– Research: http://www.oclc.org/research/projects/termservices/– Service: http://www.oclc.org/terminologies/default.htm

• NSDL [vocabulary] Registry http://metadataregistry.org/vocabulary/list.html• STAR (Semantic Technologies for Archaeological Resources) http://

hypermedia.research.glam.ac.uk/kos/star/ -- SKOS terminology services http://hypermedia.research.glam.ac.uk/kos/terminology_services/

• TRSS (Terminology Registry Scoping Study)   http://www.ukoln.ac.uk/projects/trss/

• Alistair Miles, Taxonomies and the Semantic Web, CISTRANA Workshop 02/06 http://isegserv.itd.rl.ac.uk/public/skos/press/cistrana200602/taxonomies-semanticweb.ppt

• JISC state-of-the-art review "Terminology Services and Technologies" http://www.ukoln.ac.uk/terminology/JISC-review2006.html