An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

71
An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006

Transcript of An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

Page 1: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

An introduction to metadata in digital projects

Jenn Riley

Metadata Librarian

L566 Fall 2006

Page 2: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 2

Topics we’ll cover

Choosing descriptive metadata standardsChoosing controlled vocabulariesUsing controlled vocabularies to enhance

searching and browsingWrapping it all up

Page 3: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

Choosing descriptive metadata standards

Page 4: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 4

Descriptive metadata

Enables users to find relevant materialsUsed by many different knowledge

domainsMany potential representationsControlled by

Data structure standards Data content standards Syntax encoding schemes Vocabulary encoding schemes

Page 5: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 5

Some data structure standards

Dublin Core (DC) Unqualified (simple) Qualified

MAchine Readable Cataloging (MARC)MARC in XML (MARCXML)Metadata Object Description Schema

(MODS)

Page 6: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 6

How do I pick one? (1)

Institution Nature of holding institution Resources available for metadata creation What others in the community are doing Formats supported by your delivery software

The standard Purpose Structure Context History

Page 7: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 7

How do I pick one? (2)

Materials Genre Format Likely audiences What metadata already exists for these materials

Project goals Robustness needed for the given materials and users Describing multiple versions Mechanisms for providing relationships between records Plan for interoperability, including repeatability of elements

More information on handout

Page 8: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 8

Dublin Core (DC)

15-element setNational and international standard

2001: Released as ANSI/NISO Z39.85 2003: Released as ISO 15836

Maintained by the Dublin Core Metadata Initiative (DCMI)

Other players DCMI Working Groups DC Usage Board

Page 9: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 9

DCMI mission

The Dublin Core Metadata Initiative provides simple standards to facilitate the finding, sharing and management of information.

DCMI does this by: Developing and maintaining international

standards for describing resources Supporting a worldwide community of users and

developers Promoting widespread use of Dublin Core

solutions

Page 10: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 10

DC Principles

“Core” across all knowledge domainsNo element requiredAll elements repeatable1:1 principle

Page 11: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 11

DCMI Abstract Model

Released in 2005 “A reference model against which particular DC

encoding guidelines can be compared” Heavily influenced by RDF thinking New XML and RDF encodings under

development to conform to the abstract model Two schools of thought on its development

Clarifies model underlying the metadata standard Overly complicates a standard intended to be simple

Page 12: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 12

DC encodings

HTML <meta>XMLRDF[Spreadsheets][Databases]

Page 13: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 13

Content/value standards for DC

None requiredSome elements recommend a content

or value standard as a best practice Relation Source Subject Type

Coverage Date Format Language Identifier

Page 14: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 14

Some limitations of simple DC

Can’t indicate a main title vs. other subordinate titles

No method for specifying creator rolesW3CDTF format can’t indicate date ranges

or uncertaintyCan’t by itself provide robust record

relationships

Page 15: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 15

Good times to use DC

Cross-collection searchingCross-domain discoveryMetadata sharingDescribing some types of simple

resourcesMetadata creation by novices

Page 16: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

DC[record]

QDC[record]

[collection]

MARC[record]

[collection]MARCXML

[record]

MODS[record]

[collection]

Record format

XMLRDF

(X)HTML

Field labels Text

Reliance on AACR

None

Common method of

creation

By novices, by

specialists, and by

derivation

Page 17: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 17

Qualified Dublin Core (QDC)

Adds some increased specificity to Unqualified Dublin Core

Same governance structure as DC Same encodings as DC Same content/value standards as DC Listed in DMCI Terms Additional principles

Extensibility Dumb-down principle

Page 18: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 18

Types of DC qualifiers

Additional elementsElement refinementsEncoding schemes

Vocabulary encoding schemes Syntax encoding schemes

Page 19: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 19

DC qualifier status

RecommendedConformingObsoleteRegistered

Page 20: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 20

Limitations of QDC

Widely misunderstoodNo method for specifying creator rolesW3CDTF format can’t indicate date ranges

or uncertaintySplit across 3 XML schemas

Page 21: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 21

Best times to use QDC

More specificity needed than simple DC, but not a fundamentally different approach to description

Want to share DC with others, but need a few extensions for your local environment

Describing some types of simple resources

Metadata creation by novices

Page 22: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

DC[record]

QDC[record]

[collection]

MARC[record]

[collection]MARCXML

[record]

MODS[record]

[collection]

Record format

XMLRDF

(X)HTML

XMLRDF

(X)HTML

Field labels Text Text

Reliance on AACR

None None

Common method of

creation

By novices, by

specialists, and by

derivation

By novices, by

specialists, and by

derivation

Page 23: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 23

MAchine Readable Cataloging (MARC) Format for the records in IUCAT, WorldCat and

other library catalogs Used for library metadata since 1960s

Adopted as national standard in 1971 Adopted as international standard in 1973

Maintained by: Network Development and MARC Standards Office at

the Library of Congress Standards and the Support Office at the National Library

of Canada

Page 24: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 24

More about MARC

Actually a family of MARC standards throughout the world U.S. & Canada use MARC21 MARC Bibliographic is for descriptive metadata

Structured as a binary interchange format ANSI/NISO Z39.2 ISO 2709

Field names Numeric fields Alphabetic subfields

Page 25: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 25

Content/value standards for MARC

None required by the format itselfBut US record creation practice relies

heavily on: AACR2r ISBD LCNAF LCSH

Page 26: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 26

Limitations of MARC

Use of all its potential is time-consumingOPACs don’t make full use of all possible

dataOPACs virtually the only systems to use

MARC dataRequires highly-trained staff to createLocal practice differs greatly

Page 27: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 27

Good times to use MARC

Integration with other records in OPACResources are like those traditionally

found in library catalogsMaximum compatibility with other libraries

is neededHave expert catalogers for metadata

creation

Page 28: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

DC[record]

QDC[record]

[collection]

MARC[record]

[collection]MARCXML

[record]

MODS[record]

[collection]

Record format

XMLRDF

(X)HTML

XMLRDF

(X)HTML

ISO 2709 [ANSI Z39.2]

Field labels Text Text Numeric

Reliance on AACR

None None Strong

Common method of

creation

By novices, by

specialists, and by

derivation

By novices, by

specialists, and by

derivation

By specialists

Page 29: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 29

MARC in XML (MARCXML)

Copies the exact structure of MARC21 in an XML syntax Numeric fields Alphabetic subfields

Implicit assumption that content/value standards are the same as in MARC

Page 30: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 30

Limitations of MARCXML

Not appropriate for direct data entryExtremely verbose syntaxFull content validation requires tools

external to XML Schema conformance

Page 31: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 31

Good times to use MARCXML

As a transition format between a MARC record and another XML-encoded metadata format

Materials lend themselves to library-type description

Need more robustness than DC offers Want XML representation to store within larger

digital object but need lossless conversion to MARC

Page 32: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

DC[record]

QDC[record]

[collection]

MARC[record]

[collection]MARCXML

[record]

MODS[record]

[collection]

Record format

XMLRDF

(X)HTML

XMLRDF

(X)HTML

ISO 2709 [ANSI Z39.2]

XML

Field labels Text Text Numeric Numeric

Reliance on AACR

None None Strong Strong

Common method of

creation

By novices, by

specialists, and by

derivation

By novices, by

specialists, and by

derivation

By specialists

By derivation

Page 33: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 33

Metadata Object Description Schema (MODS)Developed and managed by the Library of

Congress Network Development and MARC Standards Office

First released for trial use June 2002MODS 3.2 released June 2006“Schema for a bibliographic element set

that may be used for a variety of purposes, and particularly for library applications.”

Page 34: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 34

Differences between MODS and MARC

MODS is “MARC-like” but intended to be simpler

Textual tag namesEncoded in XMLSome specific changes

Some regrouping of elements Removes some elements Adds some elements

Page 35: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 35

Content/value standards for MODS

Some elements indicate a given content/value standard should be used Generally follows MARC/AACR2/ISBD

conventions But not all enforced by the MODS XML schema

Authority attribute available on some elements

Page 36: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 36

Limitations of MODS

No lossless round-trip conversion from and to MARC

Still largely implemented by library community only

Some semantics of MARC lostFormat still growing to meet the needs of

the digital library community

Page 37: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 37

Good times to use MODS

Materials lend themselves to library-type description

Want to reach both library and non-library audiences

Need more robustness than DC offersWant XML representation to store within

larger digital object

Page 38: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

DC[record]

QDC[record]

[collection]

MARC[record]

[collection]MARCXML

[record]

MODS[record]

[collection]

Record format

XMLRDF

(X)HTML

XMLRDF

(X)HTML

ISO 2709 [ANSI Z39.2]

XML XML

Field labels Text Text Numeric Numeric Text

Reliance on AACR

None None Strong Strong Implied

Common method of

creation

By novices, by

specialists, and by

derivation

By novices, by

specialists, and by

derivation

By specialists

By derivation

By specialists

and by derivation

Page 39: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 39

Picking a format

Consider all optionsMatch format to the types of discovery you

want to supportYour choice has to fit in your larger

technological infrastructure Realize the constraints you’re operating under Or, expand infrastructure!

Don’t have to choose just one, can use several for different purposes

Page 40: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 40

Mapping between metadata formats

Also called “crosswalking”To create “views” of metadata for specific

purposesMapping from robust format to more

general format is commonMapping from general format to more

robust format is ineffective

Page 41: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 41

Types of mapping logic

Mapping the complete contents of one field to another

Splitting multiple values in a single local field into multiple fields in the target schema

Translating anomalous local practices into a more generally useful value

Splitting data in one field into two or more fields Transforming data values Boilerplate values to include in output schema

Page 42: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 42

Common mapping pitfalls

Cramming in too much informationLeaving in trailing punctuationMissing context of recordsMeaningless placeholder data

ALWAYS remember the purpose of the metadata you are creating!

Page 43: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 43

No, really, which one do I pick?

It depends. Sorry.Be as robust as you can affordPlan for future uses of the metadata you

createLeverage existing expertise as much as

possibleFocus on content and value standards as

much as possible

Page 44: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 44

More information

Dublin Core DC Element Set version 1.1 DCMI Metadata Terms

MODSMARCMARCXML

Page 45: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

Break time!

Page 46: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

Choosing controlled vocabularies

Page 47: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 47

Some characteristics of CVs

Also known as “vocabulary encoding schemes”

Enumerated lists of all possible choices for a field value

Often organized into a syndetic structureUsually intended to be human-readable

Page 48: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 48

CVs in libraries

Many library CVs grow constantly with catalogers contributing new terms

Many library CVs use content standards to dictate the form of headings

Fields that use CVs are said to be under “authority control”

Page 49: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 49

Traditional uses of CVs in library catalog records

CollocationDisambiguation Interoperability

BROWSING! (Although this isn’t used much in libraries…)

Page 50: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 50

Other considerations

Human cataloging using CVs is expensiveDeveloping and maintaining CVs is

expensiveCurrent library systems usually rely on the

same string being present in all records rather than true relational structures linking records to CV terms

Page 51: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 51

When a controlled vocabulary is usefulUser browsing of a small number of

categories each with a large number of members

When many different things have the same label

When recall is a priority for a given access point

Page 52: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 52

Some common fields using CVs

NamesPlaces“Subjects”

Page 53: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 53

Names

Seeking works by or about a certain individual is frequent

Individuals are often known by many different names

Many different individuals have the same name Name authority lists often create uniqueness by

adding qualifiers Some example vocabularies:

Library of Congress Name Authority File (LCNAF) Getty Union List of Artists’ Names (ULAN)

Page 54: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 54

Places

Common in libraries to control place names in subjects, but not publication places

Many different places with the same nameOften organized hierarchicallyCommonly used vocabularies:

Library of Congress Subject Headings (LCSH) Getty Thesaurus of Geographic Names (TGN) GEONet Names Server

Page 55: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 55

“Subjects”

Libraries traditionally group topic, location, genre, form, time period and other related concepts all under “subject”

Often organized into a rich syndetic structure

General rule is to apply the most specific heading applicable

Involves subjective judgment on the part of the individual assigning the heading

Page 56: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 56

Deciding which fields to place under authority control

Consider your budgetary restraintsLearn about the functionalities possible in

your systemIdentify appropriate vocabularies that meet

defined needsDevelop a clear plan for how the fields

with controlled values will be used

Page 57: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

Using controlled vocabularies to enhance searching and

browsing

Page 58: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 58

Case Study: Cushman Collection

Funded with an Institute of Museum & Library Services (IMLS) grant

~15,000 color slides taken between 1938-1969

Cushman provided a significant amount of description

Additional metadata created to enhance genre, subject and geographic access

Page 59: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 59

Metadata for the Cushman Collection Cushman’s description

Dates Location Names

TGM I – LC Thesaurus for Graphic Materials: Subject Terms

TGM II - LC Thesaurus for Graphic Materials: Genre & Physical Characteristics

TGN – Getty Thesaurus of Geographic Names We wanted to use this high-quality metadata to

improve on past search systems

Page 60: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 60

TGM I: Subject Terms Strengths and Weaknesses Strengths include:

Pre-defined relationships between concepts Some lead-in vocabulary

Weaknesses include: Syndetic relationship lacking for new terms Language not user-friendly Not enough lead-in vocabulary Form and number of top-level categories not useful

for a browse structure

Page 61: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 61

User studies performed

Two types Group walkthroughs of prototypes Task scenario study

Some functionality suggested by the studies Refinement while searching Search suggestions Faceted browsing Browsing on subject terms at all levels CV interaction

Page 62: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 62

Browsing Image Collections

Research shows: Browsing is exploratory (Bawden) Guided, flexible browsing in context works

(Flamenco and SI Art Image Browser projects)Our usability studies show:

Structure is important Contents should be easily exposed Flexible and combinatorial browsing is desired Browsing cultivates searching

Page 63: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 63

Searching Image Collections

Research shows: Using thesaurus structure helps searching (Greenberg)

Automatic expansion of synonyms and narrower terms

User-initiated expansion of broader and related terms

Our Usability studies show: Referencing an A-Z list with no lead-in terms for

searching is NOT helpful at all Concerns about word choice Iterative reformulation of queries in context is desired

Page 64: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 64

Cushman Specifications: BrowsingDateGenreSubjects (hierarchical)

Retrieval of all records with narrower termsLocation (hierarchical)Combination of categories

Page 65: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 65

Cushman Specifications: Searching Integrated search against BOTH “free-text”

descriptions and thesaurusMapping from lead-in vocabularyRetrieval of all records with narrower termsUser-initiated broadening and narrowing

Page 66: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

Wrapping it all up

Page 67: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 67

What next?

After choosing metadata standards and controlled vocabularies Figure out where metadata creation fits in the

overall workflow Write metadata creation guidelines Design and implement a metadata creation

process

Page 68: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 68

And there’s more

Other types of metadata Content markup Technical metadata Rights metadata Preservation metadata Structural metadata

Specialized metadata standardsWhen to create a local metadata format

Page 69: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 69

In a grant proposal (1)

Give specific information on all the decisions you’ve made Metadata standards Controlled vocabularies Metadata creation workflow Discovery functionality the metadata will

supportDescribe what metadata already exists for

these materials

Page 70: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 70

In a grant proposal (2)

Indicate who will do the metadata creation work

Give reasonable cost estimatesThe more planning you do, the more likely

you are to Receive funding Complete the project on schedule Complete the project within your budget

Page 71: An introduction to metadata in digital projects Jenn Riley Metadata Librarian L566 Fall 2006.

10/17/06 L566 Fall 2006 71

That’s all for today!

[email protected] presentation slides:

<http://www.dlib.indiana.edu/~jenlrile/presentations/slis/06fall/l566/l566.ppt>

Handout: <http://www.dlib.indiana.edu/~jenlrile/presentations/slis/06fall/l566/handout.doc>