1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)
-
Upload
landon-schwartz -
Category
Documents
-
view
216 -
download
1
Transcript of 1 Thesauri, Controlled Terminologies, and other solutions Paul Miller (UKOLN) & Matthew Stiff (mda)
1
Thesauri, Controlled Terminologies,
and other solutionsPaul Miller (UKOLN) & Matthew Stiff (mda)
2
Outline
• Making words more effective...– Introducing Controlled Terminology– Introducing Thesauri
• From micro to macro– Localised vocabularies– Going online...
• Issues...– ...for Users– ...for Creators
3
The need for control...
European Community
E.E.C.
Common Market
European Union !European Union !
4
Without control...
Users are– incorrectly utilising
search terms– failing to find
significant resources
– suffering from information overload
– almost as well using Alta Vista
Creators are– cataloguing
inconsistently– unable to convey
hierarchical concepts– Scotland is in
United Kingdom is in Europe is in ...
– perpetuating localised terminology
– unable to assess, let alone undertake, integration projects.
5
With control...Users might
– gain more effective access to a resource
– gain far more effective access across resources
– reduce the number of ‘false hits’
– find what they are looking for
– even learn to think and express themselves in a structured manner.
Creators might– produce more
valuable resources
– convey complex semantic and structural concepts
– move towards disciplinary, national, international or global terminologies
– effectively integrate both new and existing resources.
6
Controlled Vocabulary
European Union
E.E.C.
Common Market
European Community
... Etc.With a controlled vocabulary, one or more of
these terms might be permitted. Use of the others for record creation or retrieval would be rejected by the system.
7
Thesaurus-based Control
European Union [preferred term]
E.E.C. [synonym]
Common Market [synonym]
European Community [synonym]
... Etc. [synonyms]
In a thesaurus, all of the terms might be considered equally valid, with one identified as the preferred term and the others as synonyms
But... Are they really synonymous...?
8
Exerting Control
• Controlled vocabularies– apparently simple
• Alphanumeric classification schema– Dewey and Universal Decimal
Classifications, etc.– Have much in common with thesauri and
controlled vocabularies.– Discussed in more detail by DESIRE
• http://www.ub2.lu.se/desire/radar/reports/D3.2.3/
• These, and thesauri, refine meaning.
9
Thesauri
• A traditional thesaurus defines synonyms and, perhaps, antonyms for terms within a given language.
• E.g.– ‘workshop’
atelier, factory, mill, plant, shop, studio, workroom
...or... ?
class, discussion group, seminar, study group.
10
Thesauri in Information Retrieval
• In the context of information retrieval, thesauri do more, facilitating the creation of hierarchies of meaning...
11
Hierarchies of Meaning
‘Glass’
‘Beer Glass’
‘Wine Glass’
‘Red wine glass’
‘White wine glass’
12
Thesaurus Components
• Most thesauri are constructed in a standard form, as defined by ISO 2788 and various national standards.
– ISO 5964 extends discussion to multilingual issues
• Four basic relationships are fundamental in thesaurus construction and use...
– Equivalence (preferred and non-preferred terms)
– Hierarchy (‘glass’ is broader than ‘wine glass’)
– Association (establishes non-hierarchical relationships)
– Scope notes (provide guidance and clarification)
13
Equivalence
• As with the European Union example, there are often situations in which users or cataloguers wish to allow multiple synonyms for any one term.
– In these cases, one term may be defined as a preferred term
“Electricity PlantUSE Power Station”
– Here, ‘Power Station’ is the preferred termExample from RCHME Thesaurus of Monument Types, © RCHME 1995.
14
Hierarchy
• An important capability of thesauri is their ability to reflect hierarchies, whether conceptual, spatial, or whatever.
– Individual thesaurus entries are linked to a class (CL), as well as to broader (BT) and narrower (NT) terms.
“BAYONETCL Armour and WeaponsBT Edged WeaponNT Plug BayonetNT Socket Bayonet” Example from mda Archaeological Objects Thesaurus, © mda, English Heritage, RCHME 1997.
15
Association
• In any large thesaurus, a significant number of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy.
– Related Terms (RT) can be used to show these links within the thesaurus.
“CHURCHRT ChurchyardRT CryptRT Presbytery” Example from RCHME Thesaurus of Monument Types, © RCHME 1995.
16
Scope Notes
• Thesaurus entries can often be terse, and difficult to interpret for the non-expert.
– Scope Notes (SN) serve to clarify entries and avoid possible confusion. They serve to embody the underlying concept, rather than the language-specific word.
“CHITTING HOUSESN A building in which potatoes can sprout
and germinate”“FERRY
SN Includes associated structures” Examples from RCHME Thesaurus of Monument Types, © RCHME 1995.
17
Putting it all together...
“FERROUS METAL EXTRACTION SITE
SN Includes preliminary processing
CL Industrial
BT Metal Industry Site
NT Ironstone Mine
NT Ironstone Pit
NT Ironstone Workings
RT Ironstone Workings”Example from RCHME Thesaurus of Monument Types, © RCHME 1995.
18
If there were more time…
• Grouping Terms…
• Facet indicators…
• Homonyms…
• And lots more!
19
Working with the tools
• Thesauri, controlled vocabulary lists, etc, are all useful, but they
– often rely upon both cataloguers and users having direct access to these usually weighty tomes
– require an awareness of cataloguing issues and practice to be used most effectively
– have predominantly developed within –– rather than between –– communities, regions, etc.
– rapidly become destabilised as distributed users add new terms in a non–complimentary fashion
20
Effective distributed thesauri [1]
• In order for thesauri to be effective in the online environment, research and good practice need to address;
– mapping between existing thesauri– technical mapping
– semantic mapping
• are ‘E.E.C.’ and ‘Common Market’ synonymous?
– restructuring one or both where necessary/ possible
– inter–disciplinary mapping
• the ‘God Problem’
– addressing legacy data
21
Effective distributed thesauri [2]
– delivery of training to remote cataloguers– providing online access to more existing thesauri– development of cataloguing tools
– capable of accessing various remote thesauri and selecting terms in an intuitive, timely, fashion
• Nordic Metadata Project Dublin Core tool
– raising the profile of thesauri as “A Good Thing”!– Development of user interface tools
– capable of integrating various remote thesauri into the search process without slowing it intolerably, losing contextual awareness or subjecting the browser to information overload.