1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

21
1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

Transcript of 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

Page 1: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

1

Thesauri, Controlled Terminologies,

and other solutionsPaul Miller (UKOLN) & Matthew Stiff (mda)

Page 2: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

2

Outline

• Making words more effective...– Introducing Controlled Terminology– Introducing Thesauri

• From micro to macro– Localised vocabularies– Going online...

• Issues...– ...for Users– ...for Creators

Page 3: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

3

The need for control...

European Community

E.E.C.

Common Market

European Union !European Union !

Page 4: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

4

Without control...

Users are– incorrectly utilising

search terms– failing to find

significant resources

– suffering from information overload

– almost as well using Alta Vista

Creators are– cataloguing

inconsistently– unable to convey

hierarchical concepts– Scotland is in

United Kingdom is in Europe is in ...

– perpetuating localised terminology

– unable to assess, let alone undertake, integration projects.

Page 5: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

5

With control...Users might

– gain more effective access to a resource

– gain far more effective access across resources

– reduce the number of ‘false hits’

– find what they are looking for

– even learn to think and express themselves in a structured manner.

Creators might– produce more

valuable resources

– convey complex semantic and structural concepts

– move towards disciplinary, national, international or global terminologies

– effectively integrate both new and existing resources.

Page 6: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

6

Controlled Vocabulary

European Union

E.E.C.

Common Market

European Community

... Etc.With a controlled vocabulary, one or more of

these terms might be permitted. Use of the others for record creation or retrieval would be rejected by the system.

Page 7: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

7

Thesaurus-based Control

European Union [preferred term]

E.E.C. [synonym]

Common Market [synonym]

European Community [synonym]

... Etc. [synonyms]

In a thesaurus, all of the terms might be considered equally valid, with one identified as the preferred term and the others as synonyms

But... Are they really synonymous...?

Page 8: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

8

Exerting Control

• Controlled vocabularies– apparently simple

• Alphanumeric classification schema– Dewey and Universal Decimal

Classifications, etc.– Have much in common with thesauri and

controlled vocabularies.– Discussed in more detail by DESIRE

• http://www.ub2.lu.se/desire/radar/reports/D3.2.3/

• These, and thesauri, refine meaning.

Page 9: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

9

Thesauri

• A traditional thesaurus defines synonyms and, perhaps, antonyms for terms within a given language.

• E.g.– ‘workshop’

atelier, factory, mill, plant, shop, studio, workroom

...or... ?

class, discussion group, seminar, study group.

Page 10: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

10

Thesauri in Information Retrieval

• In the context of information retrieval, thesauri do more, facilitating the creation of hierarchies of meaning...

Page 11: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

11

Hierarchies of Meaning

‘Glass’

‘Beer Glass’

‘Wine Glass’

‘Red wine glass’

‘White wine glass’

Page 12: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

12

Thesaurus Components

• Most thesauri are constructed in a standard form, as defined by ISO 2788 and various national standards.

– ISO 5964 extends discussion to multilingual issues

• Four basic relationships are fundamental in thesaurus construction and use...

– Equivalence (preferred and non-preferred terms)

– Hierarchy (‘glass’ is broader than ‘wine glass’)

– Association (establishes non-hierarchical relationships)

– Scope notes (provide guidance and clarification)

Page 13: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

13

Equivalence

• As with the European Union example, there are often situations in which users or cataloguers wish to allow multiple synonyms for any one term.

– In these cases, one term may be defined as a preferred term

“Electricity PlantUSE Power Station”

– Here, ‘Power Station’ is the preferred termExample from RCHME Thesaurus of Monument Types, © RCHME 1995.

Page 14: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

14

Hierarchy

• An important capability of thesauri is their ability to reflect hierarchies, whether conceptual, spatial, or whatever.

– Individual thesaurus entries are linked to a class (CL), as well as to broader (BT) and narrower (NT) terms.

“BAYONETCL Armour and WeaponsBT Edged WeaponNT Plug BayonetNT Socket Bayonet” Example from mda Archaeological Objects Thesaurus, © mda, English Heritage, RCHME 1997.

Page 15: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

15

Association

• In any large thesaurus, a significant number of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy.

– Related Terms (RT) can be used to show these links within the thesaurus.

“CHURCHRT ChurchyardRT CryptRT Presbytery” Example from RCHME Thesaurus of Monument Types, © RCHME 1995.

Page 16: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

16

Scope Notes

• Thesaurus entries can often be terse, and difficult to interpret for the non-expert.

– Scope Notes (SN) serve to clarify entries and avoid possible confusion. They serve to embody the underlying concept, rather than the language-specific word.

“CHITTING HOUSESN A building in which potatoes can sprout

and germinate”“FERRY

SN Includes associated structures” Examples from RCHME Thesaurus of Monument Types, © RCHME 1995.

Page 17: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

17

Putting it all together...

“FERROUS METAL EXTRACTION SITE

SN Includes preliminary processing

CL Industrial

BT Metal Industry Site

NT Ironstone Mine

NT Ironstone Pit

NT Ironstone Workings

RT Ironstone Workings”Example from RCHME Thesaurus of Monument Types, © RCHME 1995.

Page 18: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

18

If there were more time…

• Grouping Terms…

• Facet indicators…

• Homonyms…

• And lots more!

Page 19: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

19

Working with the tools

• Thesauri, controlled vocabulary lists, etc, are all useful, but they

– often rely upon both cataloguers and users having direct access to these usually weighty tomes

– require an awareness of cataloguing issues and practice to be used most effectively

– have predominantly developed within –– rather than between –– communities, regions, etc.

– rapidly become destabilised as distributed users add new terms in a non–complimentary fashion

Page 20: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

20

Effective distributed thesauri [1]

• In order for thesauri to be effective in the online environment, research and good practice need to address;

– mapping between existing thesauri– technical mapping

– semantic mapping

• are ‘E.E.C.’ and ‘Common Market’ synonymous?

– restructuring one or both where necessary/ possible

– inter–disciplinary mapping

• the ‘God Problem’

– addressing legacy data

Page 21: 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)

21

Effective distributed thesauri [2]

– delivery of training to remote cataloguers– providing online access to more existing thesauri– development of cataloguing tools

– capable of accessing various remote thesauri and selecting terms in an intuitive, timely, fashion

• Nordic Metadata Project Dublin Core tool

– raising the profile of thesauri as “A Good Thing”!– Development of user interface tools

– capable of integrating various remote thesauri into the search process without slowing it intolerably, losing contextual awareness or subjecting the browser to information overload.