1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower...

35
1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information

Transcript of 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower...

Page 1: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

1

Enhancing searchAn update on taxonomies, metadata and thesauri

Leonard Will

Willpower Information

Page 2: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

2

Summary

1 Metadata creation is cataloguing

2 Taxonomies are classifications

3 Thesauri and classifications are complementary ways of grouping concepts

4 Facet analysis is a useful technique for constructing schemes systematically

5 Most computer search interfaces are inadequate

Page 3: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

3

Metadata = catalogue records

• Resources: any things that can be identified

– documents, web pages, images, sound files, teaching packages, books, museum objects, people, organisations

• Metadata: structured information about resources

– May be included with resources (e.g. “CIP”) or collected in separate “union catalogues” (e.g. OAI-PMH)

– Some from the resource itself (size, format), some from external sources (provenance, location, accessibility)

Page 4: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

4

Metadata standards

• Anglo-American Cataloguing Rules (AACR)• Encoded Archival Description (EAD)• Learning Object Metadata (LOM)• Spectrum standard for museum information• Friend of a Friend (FOAF) and vCard• e-Government Metadata Standard (eGMS) • Dublin Core - lowest common denominator

Page 5: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

5

Kinds of standards

• Content standards: which pieces of information are to be recorded (DC, AACR)

• Value standards: how is the information to be recorded (= DC encoding schemes)– formats (ISO date format, NCA name formats, AACR)– lists of valid values (thesauri, authority files)

• Structure standards: how the information is to be grouped and labelled for use by computers and humans (XML schemas, MARC)

• Application profiles: Choices from the above

Page 6: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

6

Dublin Core metadata

• Title• Creator• Subject• Description• Publisher• Contributor• Date• Type

• Format• Identifier• Source• Language• Relation• Coverage• Rights• + element refinements

Page 7: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

7

Subject

“Typically, Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource.

Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.”

Page 8: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

8

Taxonomies = controlled vocabularies

• “Taxonomy”: woolly meaning -> confusion– keep it for biological classification systems

• Knowledge organization systems (KOS)– a better expression for the general concept

• Main types are– thesauri– classification schemes– ontologies

Page 9: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

9

Thesauri and classification schemes

• Thesauri and classification schemes are alternative ways of showing concepts and their relationships

• They are complementary and both approaches are needed

• They can both be built on the principles of facet analysis

Page 10: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

10

Building blocks of all knowledge organisation schemes

• concepts

• relationships35 m cameras CC:H012 BT: film camerasaqualungs CC: D002 BT: diving equipmentcamera accessories CC: H002 BT: photographic equipment NT: flash guns

light meterstripods

RT: cameras

Page 11: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

11

Relationships are between concepts, not words

BT

NT

vehiclesroad vehiclesconveyancesvoitures388.34629.2

carsautomobilesautosprivate cars388.342629.222

Choose one term as a descriptor to label the concept:

cars USE automobiles

Page 12: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

12

Preferred term substitution

Anythingon farming?

I use the term agriculture for farming, so I’ll search for that

Page 13: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

13

Relationships between concepts

• Paradigmatic, or a priori: apply generally, independently of any specific document– shoes BT footwear– shoes RT shoemakers

• Syntagmatic, or a posteriori: concepts that are related only in the context of a specific document– shoes : history– shoes : prices

A thesaurus can show these

A classification scheme can also show these

Page 14: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

14

Searching hierarchies

I need informationon road vehicles

I know that buses,cars and lorries are all kinds of road

vehicles, so I’ll search for

these terms as well as for road

vehicles

Page 15: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

15

Searching related terms

Please give me information

about agriculture

OK,I’ll look for that. Would you

also be interested in items dealing with forestry,

livestock or pet breeding?

Page 16: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

16

Paradigmatic relationshipsin a thesaurus

• Many relationships are indicated as RT/RT, but their nature is not specified, so cannot be used for systematic grouping (ontologies overcome this)

• Hierarchical generic-specific relationship (BT/NT) allows (requires) grouping of concepts into facets - the terms have to be in the same facet

Page 17: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

17

What is a facet?(Sometimes called a fundamental facet)

A high-level grouping of concepts of the same inherent category, e.g. activities, disciplines, people, materials, places, times. For example:

animals, mice, daffodils and bacteria could all be members of a living organisms facet;

digging, writing and cooking could all be members of an activities facet;

birthdays, wars and football matches could all be members of an events facet.

A concept cannot belong to more than one facet

Page 18: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

19

A grouping of concepts within a facet by some stated characteristic of division.

vehicles

<vehicles by number of wheels> bicycles tricycles four-wheeled vehicles

automobiles

<vehicles by load carried> goods vehicles

lorries passenger vehicles

automobilesbuses

What is an array?(Sometimes called a subfacet)

Node labelsshowing

characteristicsof division

Array

Array

A concept may occur in more than one array

Page 19: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

20

Parametric search

• Searching for resources that have one or more specified characteristics

• e.g. vehicles which– have three wheels AND– are used for carrying passengers

• This is an important and useful aspect of post-coordinate searching, but it is not faceted classification

Page 20: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

21

Ways of displaying concepts and their paradigmatic relationships

1. Alphabetically, with their relationships35 mm cameras BT: film cameras

aqualungs BT: diving equipment

camera accessories BT: photographic equipment NT: flash guns

light meterstripods

RT: cameras

Page 21: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

22

Ways of displaying concepts and their paradigmatic relationships

2. Hierarchically - one tree for each facet

(fields of work) . diving. photography. physics. . optics

(people)<people by age> . infants . children . adults<people by occupation> . divers . models (people) . photographers . physicists

(equipment). diving equipment. . aqualungs. . diving suits. . . dry suits. . . wet suits. . face masks. photo equipment. . cameras

Page 22: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

23

Ways of displaying concepts and their paradigmatic relationships

3. In subject groups or categories (microthesauri)– one tree for each facet in each category

(fields of work) . diving. . scuba diving. . snorkel diving

(people). divers

(equipment). diving equipment. . aqualungs. . diving suits. . . dry suits

(fields of work) . photography. . colour photography

(people) . models (people) . photographers

(equipment). photo equipment. . cameras

797.23: DIVING

770: PHOTOGRAPHY

Page 23: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

24

Combining concepts :syntagmatic relationships

(places)A1 ItalyA2 The NetherlandsA3 Russia

(people)B1 pottersB2 repairersB3 ceramicists

(activities)C1 mouldingC2 throwingC3 decoration

(objects)D1 earthenwareD2 porcelainD3 stoneware

Combine to express compound subjects - either post-coordinate, for searching:

porcelain AND decoration AND Russiaor pre-coordinate, for browsing:

porcelain decoration in Russia: D2C3A3

Node labelsshowing

facet names

Page 24: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

25

Order of combining facets

thing - kind - part - property - material - process - operation - system operated on - product - by-product - agent - space - time - form

e.g.porcelain (thing) - decoration (process) - in Russia (space)

A facet may occur more than once in a string

Page 25: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

26

Faceted classificationwith processes subordinated to objects

(processes)A ceramic production processes in generalAA forming in generalAAA coilingAAB mouldingAAC throwingAB decoration in generalABA glazingABB transfer printing

(objects)B ceramics in generalBB earthenware in general

(processes)BB.AA forming of earthenware BB.AAB moulding of earthenware BB.AB decoration of earthenware BB.ABA glazing of earthenware BB.ABB transfer printing of earthenware BC porcelain in general

(processes)BC.AA forming of porcelainBC.AAB moulding of porcelain

Words shown in blue may be omitted as they are implied by the hierarchical structure

Page 26: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

27

Faceted classificationgeneration of subject strings

(objects)B ceramicsBB earthenware

(processes)BB.AA formingBB.AAB mouldingBB.AB decorationBB.ABA glazingBB.ABB transfer printingBC porcelain

(processes)BC.AA formingBC.AAB moulding

ceramics > earthenware > formingceramics > earthenware > forming > mouldingceramics > earthenware > decorationceramics > earthenware > decoration > glazingceramics > earthenware > decoration > transfer printingceramics > porcelainceramics > porcelain > formingceramics > porcelain > forming > moulding

Page 27: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

28

Alphabetical index

ceramic production processes Aceramics Bcoiling : forming : ceramic production AAAdecoration : ceramic production ABdecoration : earthenware : ceramics BB.ABearthenware : ceramics BBforming : ceramic production AAforming : earthenware : ceramics BB.AAforming : porcelain : ceramics BC.AAglazing : decoration : ceramic production ABAglazing : decoration : earthenware : ceramics BB.ABAmoulding : earthenware : ceramics BB.AABmoulding : forming : ceramic production AABmoulding : porcelain : ceramics BC.AABporcelain : ceramics BCthrowing : forming : ceramic production AACtransfer printing : decoration : ceramic production ABBtransfer printing : decoration : earthenware : ceramics BB.ABB

Page 28: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

29

The same concepts viewed in different ways

Thesaurus view Good for searching if you

know what you want Like a gazetteer Like a book’s index Gets quickly to individual

concepts Usually arranged by facet Shows paradigmatic

relationships Lets you combine concepts

when searching

Classification view Good for browsing or

surveying a topic Like a map Like a book’s contents page Shows related concepts

together Usually arranged by discipline Shows syntagmatic and

paradigmatic relationships Shows compound topics as

pre-combined subject strings

Page 29: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

30

Some clarifications

• A classification can be both hierarchical and faceted• A classification built on faceted principles can be

enumerative• A symbolic notation is not essential, and should not

determine the structure• A classification can arrange compound topics in a

useful linear sequence - a thesaurus cannot• One-to-one mapping between a thesaurus and a

classification is not possible• A “guide to popular topics” may be used to

supplement a systematic classification

Page 30: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

31

Use of a thesaurus

• A thesaurus as a search aid with unindexed material– Allows searching on terms linked to the term

asked for

• Software support for formulating questions– Browsing the thesaurus to choose terms– Combining terms with AND, OR, NOT and ( )

Page 31: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

32

An ambiguous search interface

Does this mean: (lorries OR cars) AND diesel ?or does it mean: lorries OR (cars AND diesel) ?

Page 32: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

33

Thesaurus creation and management

• Standards– BS/ISO standards give helpful guidance– Draft revised BS standard now out for comments

• Software– Many packages available– Best if integrated with database used for cataloguing

• Cooperative thesaurus development and use– DIY is a major and continuing task

Page 33: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

34

Thesaurus development never ends

• It is an ongoing task

• It needs a knowledgeable thesaurus editor

• It needs cooperation and input from indexers and users

• User feedback

Page 34: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

35

What we need

• Software for the combined development of thesaurus and classification– Thesaurofacet; Classaurus; ROOT; Bliss; Taxomita

• Software support for combining facets when searching, using a thesaurus. Often referred to as faceted classification, but not the same thing– Flamenco; View-based searching; No zero match (NZM)

• Software support for browsing in a classified catalogue with notation, captions and an alphabetical index

Page 35: 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

36

Links and further information

<http://www.willpowerinfo.co.uk/>