1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information Slide 2 2 Summary 1Metadata creation is cataloguing 2Taxonomies are classifications 3Thesauri and classifications are complementary ways of grouping concepts 4Facet analysis is a useful technique for constructing schemes systematically 5Most computer search interfaces are inadequate Slide 3 3 Metadata = catalogue records Resources: any things that can be identified documents, web pages, images, sound files, teaching packages, books, museum objects, people, organisations Metadata: structured information about resources May be included with resources (e.g. CIP) or collected in separate union catalogues (e.g. OAI-PMH) Some from the resource itself (size, format), some from external sources (provenance, location, accessibility) Slide 4 4 Metadata standards Anglo-American Cataloguing Rules (AACR) Encoded Archival Description (EAD) Learning Object Metadata (LOM) Spectrum standard for museum information Friend of a Friend (FOAF) and vCard e-Government Metadata Standard (eGMS) Dublin Core - lowest common denominator Slide 5 5 Kinds of standards Content standards: which pieces of information are to be recorded (DC, AACR) Value standards: how is the information to be recorded (= DC encoding schemes) formats (ISO date format, NCA name formats, AACR) lists of valid values (thesauri, authority files) Structure standards: how the information is to be grouped and labelled for use by computers and humans (XML schemas, MARC) Application profiles: Choices from the above Slide 6 6 Dublin Core metadata Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights + element refinements Slide 7 7 Subject Typically, Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. Slide 8 8 Taxonomies = controlled vocabularies Taxonomy: woolly meaning -> confusion keep it for biological classification systems Knowledge organization systems (KOS) a better expression for the general concept Main types are thesauri classification schemes ontologies Slide 9 9 Thesauri and classification schemes Thesauri and classification schemes are alternative ways of showing concepts and their relationships They are complementary and both approaches are needed They can both be built on the principles of facet analysis Slide 10 10 Building blocks of all knowledge organisation schemes concepts relationships 35 m cameras CC:H012 BT:film cameras aqualungs CC:D002 BT:diving equipment camera accessories CC:H002 BT:photographic equipment NT:flash guns light meters tripods RT:cameras Slide 11 11 Relationships are between concepts, not words BT NT vehicles road vehicles conveyances voitures 388.34 629.2 cars automobiles autos private cars 388.342 629.222 Choose one term as a descriptor to label the concept: cars USE automobiles Slide 12 12 Preferred term substitution Anything on farming? I use the term agriculture for farming, so Ill search for that Slide 13 13 Relationships between concepts Paradigmatic, or a priori: apply generally, independently of any specific document shoes BT footwear shoes RT shoemakers Syntagmatic, or a posteriori: concepts that are related only in the context of a specific document shoes : history shoes : prices A thesaurus can show these A classification scheme can also show these Slide 14 14 Searching hierarchies I need information on road vehicles I know that buses,cars and lorries are all kinds of road vehicles, so Ill search for these terms as well as for road vehicles Slide 15 15 Searching related terms Please give me information about agriculture OK,Ill look for that. Would you also be interested in items dealing with forestry, livestock or pet breeding? Slide 16 16 Paradigmatic relationships in a thesaurus Many relationships are indicated as RT/RT, but their nature is not specified, so cannot be used for systematic grouping (ontologies overcome this) Hierarchical generic-specific relationship (BT/NT) allows (requires) grouping of concepts into facets - the terms have to be in the same facet Slide 17 17 What is a facet? (Sometimes called a fundamental facet) A high-level grouping of concepts of the same inherent category, e.g. activities, disciplines, people, materials, places, times. For example: animals, mice, daffodils and bacteria could all be members of a living organisms facet; digging, writing and cooking could all be members of an activities facet; birthdays, wars and football matches could all be members of an events facet. A concept cannot belong to more than one facet Slide 18 18 Facets in the AAT associated (i.e. abstract) concepts physical attributes styles and periods agents activities materials objects Slide 19 19 A grouping of concepts within a facet by some stated characteristic of division. vehicles bicycles tricycles four-wheeled vehicles automobiles goods vehicles lorries passenger vehicles automobiles buses What is an array? (Sometimes called a subfacet) Node labels showing characteristics of division Array A concept may occur in more than one array Slide 20 20 Parametric search Searching for resources that have one or more specified characteristics e.g. vehicles which have three wheels AND are used for carrying passengers This is an important and useful aspect of post-coordinate searching, but it is not faceted classification Slide 21 21 Ways of displaying concepts and their paradigmatic relationships 1. Alphabetically, with their relationships 35 mm cameras BT:film cameras aqualungs BT:diving equipment camera accessories BT:photographic equipment NT:flash guns light meters tripods RT:cameras Slide 22 22 Ways of displaying concepts and their paradigmatic relationships 2. Hierarchically - one tree for each facet (fields of work). diving. photography. physics.. optics (people). infants. children. adults. divers. models (people). photographers. physicists (equipment). diving equipment.. aqualungs.. diving suits... dry suits... wet suits.. face masks. photo equipment.. cameras Slide 23 23 Ways of displaying concepts and their paradigmatic relationships 3. In subject groups or categories (microthesauri) one tree for each facet in each category (fields of work). diving.. scuba diving.. snorkel diving (people). divers (equipment). diving equipment.. aqualungs.. diving suits... dry suits (fields of work). photography.. colour photography (people). models (people). photographers (equipment). photo equipment.. cameras 797.23: DIVING 770: PHOTOGRAPHY Slide 24 24 Combining concepts : syntagmatic relationships (places) A1Italy A2The Netherlands A3Russia (people) B1potters B2repairers B3ceramicists (activities) C1moulding C2throwing C3decoration (objects) D1earthenware D2porcelain D3stoneware Combine to express compound subjects - either post-coordinate, for searching: porcelain AND decoration AND Russia or pre-coordinate, for browsing: porcelain decoration in Russia: D2C3A3 Node labels showing facet names Slide 25 25 Order of combining facets thing - kind - part - property - material - process - operation - system operated on - product - by- product - agent - space - time - form e.g. porcelain (thing) - decoration (process) - in Russia (space) A facet may occur more than once in a string Slide 26 26 Faceted classification with processes subordinated to objects (processes) A ceramic production processes in general AAforming in general AAAcoiling AABmoulding AACthrowing AB decoration in general ABAglazing ABBtransfer printing (objects) B ceramics in general BBearthenware in general (processes) BB.AA forming of earthenware BB.AAB moulding of earthenware BB.AB decoration of earthenware BB.ABA glazing of earthenware BB.ABB transfer printing of earthenware BC porcelain in general (processes) BC.AA forming of porcelain BC.AAB moulding of porcelain Words shown in blue may be omitted as they are implied by the hierarchical structure Slide 27 27 Faceted classification generation of subject strings (objects) B ceramics BBearthenware (processes) BB.AA forming BB.AAB moulding BB.AB decoration BB.ABA glazing BB.ABB transfer printing BC porcelain (processes) BC.AA forming BC.AAB moulding ceramics > earthenware > forming ceramics > earthenware > forming > moulding ceramics > earthenware > decoration ceramics > earthenware > decoration > glazing ceramics > earthenware > decoration > transfer printing ceramics > porcelain ceramics > porcelain > forming ceramics > porcelain > forming > moulding Slide 28 28 Alphabetical index ceramic production processesA ceramicsB coiling : forming : ceramic productionAAA decoration : ceramic productionAB decoration : earthenware : ceramicsBB.AB earthenware : ceramicsBB forming : ceramic productionAA forming : earthenware : ceramicsBB.AA forming : porcelain : ceramicsBC.AA glazing : decoration : ceramic productionABA glazing : decoration : earthenware : ceramicsBB.ABA moulding : earthenware : ceramicsBB.AAB moulding : forming : ceramic productionAAB moulding : porcelain : ceramicsBC.AAB porcelain : ceramicsBC throwing : forming : ceramic productionAAC transfer printing : decoration : ceramic productionABB transfer printing : decoration : earthenware : ceramicsBB.ABB Slide 29 29 The same concepts viewed in different ways Thesaurus view Good for searching if you know what you want Like a gazetteer Like a books index Gets quickly to individual concepts Usually arranged by facet Shows paradigmatic relationships Lets you combine concepts when searching Classification view Good for browsing or surveying a topic Like a map Like a books contents page Shows related concepts together Usually arranged by discipline Shows syntagmatic and paradigmatic relationship