The Importance of Metadataufdcimages.uflib.ufl.edu/IR/00/00/01/06/00001/Acuril_Mark.pdf · Library...
Transcript of The Importance of Metadataufdcimages.uflib.ufl.edu/IR/00/00/01/06/00001/Acuril_Mark.pdf · Library...
Metadata Research and
Application in The
Management of Digital
Collections
Mark Sullivan University of Florida Libraries
Digital Library of the Caribbean
Introduction
• What is metadata?
• Standards
• Standards Comparison
What is metadata?
What is metadata?
What is metadata?
Library and Museum Metadata
Library and Museum Metadata
Greek Vase
470 – 460 BC
35 centimeters
Odysseus and Eumaios
the Swineherd
from Homer’s story of the
Odyssey
Library and Museum Metadata
Format: Greek Vase
Date: 470 – 460 BC
Height: 35 centimeters
Title: [Greek Vase of
Odysseus and Eumaios
the Swineherd]
Notes: from Homer’s story
of the Odyssey
Library and Museum Metadata
Library and Museum Metadata
Title: L' Isle St. Domingue
Alternate Title: Atlas portatif Universel..
Physical Desc: 1 Map. : col ; 17 x 22 cm on sheet 23 x 32 cm.
Language: French
Creator: Robert de Vaugondy, Gilles , 1688-1766
Date: 1749
Place of Publ: Paris 1749
Subjects: Maps -- Early works to 1800 -- West Indies ( lcsh )
Maps -- Early works to 1800 -- Hispaniola ( lcsh )
Genre: single map ( marcgt )
Maps ( lcsh )
Early works to 1800 ( lcsh )
Spatial Coverage: Haiti, Dominican Republic
Note: Outline color.
Library and Museum Metadata
Library and Museum Metadata
Title: Avanzamos o retrocedemos? : reflexiones sobre temas educativos
Physical Desc: 165 p.
Language: Spanish
Creator: Radhames Mejia
Publisher: Santo Domingo : PUCMM/CIEDHUMANO
Date: 2010
Subject: Educacion - Republica Dominicana - Ensayos, conferencias, etc.
Abstract: Este libro recoge una serie de artaculos sobre temas educativos
publicados en los aos 2006 y 2007 en el periodico El Caribe. En su mayorÃa son
reflexiones alrededor de asuntos de la agenda educativa nacional que iban
apareciendo o, que de manera estructural, caracterizan a la educacion dominicana.
Identifier: RD 370.97293 M516a
isbn - 9789945415339
Library Metadata
Library Metadata - MARC
Metadata Standards
Why Metadata Standards?
Standards
“The nice thing about standards is there are so
many to choose from.”
- Andrew Tannenbaum ( 1988 )
Types of Metadata Standards
• Bibliographic Description
• Metadata “Wrappers” and Transport
• Collection Descriptions
• Other Standards
• Authority Standards
• Proprietary Standards
• etc…
Types of Metadata Standards
• Bibliographic Description
• Metadata “Wrappers” and Transport
• Collection Descriptions
• Other Standards
• Authority Standards
• Proprietary Standards
• etc…
Bibliographic Description Standards
• MARC
• Dublin Core
• MODS
• VRA Core
Bibliographic Description Standards: MARC
• MARC
• Originally developed in 1960s by Library of Congress
• Most embraced metadata standard in libraries
• MARC 21
• Combination of USMARC and CAN/MARC
• Metadata standard for 21st century
• MarcXML
• Same MARC format, except encoded in XML
Bibliographic Description Standards: MARC
• MARC
• Positives:
• Can encode a very granular amount of data
• Very well adopted and works well with machine readers
• Negatives
• Not very human-readable
• High learning curve
• Mixes data with display (as commonly implemented)
Bibliographic Description Standards: DC
• Dublin Core
• Originally developed between 1994 and 1995 in Dublin Ohio
by OCLC
• Used widely on web pages to assist search engines
• Simplified Dublin Core
Title Creator Subject
Description Publisher Contributor
Date Type Format
Identifier Source Language
Relation Coverage Rights
Bibliographic Description Standards: DC
• Dublin Core
• Qualified Dublin Core
• Added three elements ( Audience, Provenance, RightsHolder )
• More importantly, added some basic refinements
• Positives
• Widely accepted and easy to read and encode
• Negatives
• Even with qualified dublin core, difficult to encode complex data
• Lack of data refinement leads to loss of UI options
Bibliographic Description Standards: MODS
• Metadata Object Description Scheme
• Developed in 2002 by Library of Congress
• Beginning to be the de facto standard for digital libraries
(MODS/METS)
• Positives
• Can handle a (large) subset of MARC tags
• Handles complex objects and can easily be extended
• Negatives
• Hard to do round trip portability from MARC MODS MARC
Bibliographic Description Standards: VRA Core
• Visual Resource Association
• Developed in 1996 by Visual Resource Association
• Used for describing visual/cultural materials
• Includes all the standard tags
• Date, Title, Description, Rights, Subject, etc..
• Also more unique tags
• Cultural Context
• Style
• Technique
Bibliographic Description Standards
• MARC
• Dublin Core
• MODS
• VRA Core
Types of Metadata Standards
• Bibliographic Description
• Metadata “Wrappers” and Transport
• Collection Descriptions
• Other Standards
• Authority Standards
• Proprietary Standards
• etc…
Metadata “Wrappers” and Transport
• Contents
• Files and Structural data
• Administrative data
• Choices
• METS
• OAI-PMH
• No wrapper necessary
Metadata “Wrappers” and Transport: METS
• Metadata Encoding & Transmission Standard
• Standard by Library of Congress (2001)
• Contains
• Descriptions section(s)
• Administrative section(s)
• File section(s)
• Structure map(s)
• Accepts any XML schema
Metadata “Wrappers” and Transport: OAI-PMH
• Open Archives Initiative - Protocol for Metadata
Harvesting (2000-2001)
• Defines a protocol (over HTTP)
• Main Verbs
• Identify
• ListMetadataFormats – usually dublin core (99%)
• ListSets
• ListRecords
Types of Metadata Standards
• Bibliographic Description
• Metadata “Wrappers” and Transport
• Collection Descriptions
• Other Standards
• Authority Standards
• Proprietary Standards
• etc…
Collection Metadata Standards
• (METS)
• EAD
• Encoded Archival Description
• Electronic Finding Guide/Aid
• Main Sections
• Descriptions
• Container List – can link to digital objects
• Examples: Smithsonian Univ of Florida
Types of Metadata Standards
• Bibliographic Description
• Metadata “Wrappers” and Transport
• Collection Descriptions
• Other Standards
• Authority Standards
• Proprietary Standards
• etc…
Still to come…
• Authority Metadata Standards
• NACO
• MADS
• EAC
• Other Standards
• KML
• DarwinCore
• Z39.50 / ZING
• Standards are like weeds….
Proprietary and Custom Formats
• Proprietary File Formats
• Greenstone Document File (doc.xml)
• Fedora Object File (FOXml)
• Custom Schemas
• SobekCM Schema ( UFDC and dLOC )
Back to the beginning?
Evidence-based
• Universities within Florida ( 10 schools )
• Digital Objects
• MARC ( linked to catalog )
• Dublin Core ( no wrapper, simple objects )
• METS wrapper ( for complex items )
• MODS
• EAD for Finding Aids / Guides
• OAI-PMH Harvesting
“One Standard to Rule Them All”
“One Standard to Rule Them All”
METS Wrapper
“One Standard to Rule Them All”
METS Wrapper
DC MODS
Additional Custom Data
“One Standard to Rule Them All”
METS Wrapper
Dublin Core
Additional Custom Data
“One Standard to Rule Them All”
METS Wrapper
MODS
Additional Custom Data
Archival Service Metadata
METS
MODS
SobekCM
Greenstone
Archival
Service
Archival Service Metadata
METS
MODS
SobekCM
Fedora
or
ContentDM
Archival
Service
Standards
Comparison
What is XML?
• XML = eXtensible Markup Language
• Allows information and services to be encoded with
meaningful structure and semantics that computers
and humans can understand.
• Established by the World Wide Web Consortium
XML Syntax
XML Declaration
Elements in matching tags
Nested tags
<?xml version="1.0" encoding="ISO-8859-1" standalone="no" ?>
<title>Grandmother Puss, or, The grateful mouse</title>
<book><title>Grandmother Puss, or, The grateful mouse</title>
</book>
XML Syntax ( continued )
Empty Tags
Attributes
Remarks
<book> </book>
<book />
<identifier type="oclc">32380062</identifier>
<? Data for a new digital resource ?>
XML Example
• Example for a book with a title and OCLC number
• This defines a book, but does not adhere to any
standard or schema
<?xml version="1.0" encoding="ISO-8859-1" standalone="no" ?>
<? Data for a new digital resource ?>
<book><title>Grandmother Puss, or, The grateful mouse</title>
<identifier type="oclc">32380062</identifier>
</book>
XML Schemas
• Creates a vocabulary to use
• Constrains the structure of XML data
• Defines a namespace
<?xml version="1.0" encoding="ISO-8859-1" standalone="no" ?>
<? Data for a new digital resource using 'dc' schema ?>
<book xmlns:dc="http://www.uflib.ufl.edu/digital/metadata/"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.uflib.ufl.edu/digital/metadata/dc.xsd" >
<dc:title>Grandmother Puss, or, The grateful mouse</dc:title>
<dc:identifier type="oclc">32380062</dc:identifier>
</book>
Metadata Example
Metadata Example: Dublin Core
<dc:title>Grandmother Puss, or, The grateful mouse</dc:title><dc:title>Grateful mouse</dc:title><dc:creator>McLoughlin Bros., inc ( Publisher )</dc:creator><dc:subject>Cats -- Juvenile literature ( lcsh )</dc:subject><dc:subject>Mice -- Juvenile literature ( lcsh )</dc:subject><dc:description>Cover title.</dc:description><dc:description>Pagination includes wrappers; text completed on
lower wrapper.</dc:description><dc:description>Chromolithographs: cover ill.,
text illustrations.</dc:description><dc:description>(Funding) Preservation and Access for American and
British Children's Lit (NEH PA-50860-00).</dc:description><dc:publisher>McLoughlin Brothers</dc:publisher><dc:type>Book</dc:type><dc:format>7, [1] p. : col. ill. ; 17 cm.</dc:format><dc:identifier>http://www.uflib.ufl.edu/ufdc/?b=UF00023488</dc:identifier><dc:identifier>002798984 (ALEPH)</dc:identifier><dc:source>University of Florida</dc:source><dc:language>English</dc:language><dc:rights>All rights reserved, Board of Trustees of U Florida.</dc:rights>
Title Encoding Comparison
Jessamines Children Book
MARC
245 14 |a The Jessamines
|h [electronic resource]
|b a story of a country house /
|c by Grace Stebbing.
Title Encoding Comparison
Jessamines Children Book
MarcXML
<datafield tag="245" ind1="1" ind2="4"><subfield code="a">The Jessamines</subfield><subfield code="h">[electronic resource]</subfield><subfield code="b">a story of a country house /</subfield><subfield code="c">by Grace Stebbing.</subfield></datafield>
Title Encoding Comparison
Jessamines Children Book
MODS
<mods:titleInfo><mods:nonSort>The</mods:nonSort><mods:title>Jessamines</mods:title><mods:subTitle>a story of a country house</mods:subTitle>
</mods:titleInfo>
<mods:note type="statement of responsibility">by Grace Stebbing.
</mods:note>
Title Encoding Comparison
Jessamines Children Book
Dublin Core
<dc:title>The Jessamines: a story of a country house
</dc:title>
<dc:description>by Grace Stebbing.
</dc:description>
Title Encoding Comparison
Jessamines Children Book
Dublin Core
<dc:title>The Jessamines: a story of a country house
</dc:title>
<dc:description>by Grace Stebbing.
</dc:description>
245 1 |a The Jessamines : a story of a country house
|h [electronic resource]
Title Encoding Comparison
• MARC Comparison
• From MODS
• From Dublin Core
245 14 |a The Jessamines
|h [electronic resource]
|b a story of a country house /
|c by Grace Stebbing.
245 1 |a The Jessamines : a story of a country house
|h [electronic resource]
Author Encoding Comparison
It's his way, and other stories
MARC
700 |a Elliott, E. S.
|q (Emily Steele),
|d 1836-1897.
|4 aut |4 ill
Author Encoding Comparison
It's his way, and other stories
MODS
<mods:name type="personal"><mods:namePart>Elliott, E. S.</mods:namePart><mods:namePart type="date">1836-1897</mods:namePart><mods:displayForm>Emily Steele</mods:displayForm><mods:role>
<mods:roleTerm type="text">Author</mods:roleTerm><mods:roleTerm type="code"
authority="marcrelator">aut</mods:roleTerm><mods:roleTerm type="text">Illustrator</mods:roleTerm><mods:roleTerm type="code"
authority="marcrelator">ill</mods:roleTerm></mods:role>
</mods:name>
Author Encoding Comparison
It's his way, and other stories
Dublin Core
<dc:creator>Elliott, E. S. ( Emily Steele ), 1836-1897
</dc:creator>
Author Encoding Comparison
It's his way, and other stories
Dublin Core
<dc:creator>Elliott, E. S. ( Emily Steele ), 1836-1897
</dc:creator>
700 |a Elliott, E. S. (Emily Steele), 1836-1897
Author Encoding Comparison
• MARC Comparison
• From MODS
• From Dublin Core
700 |a Elliott, E. S. (Emily Steele), 1836-1897
700 |a Elliott, E. S.
|q (Emily Steele),
|d 1836-1897.
|4 aut |4 ill
Subject Encoding Comparison
5 Little Pigs
MODS
<mods:subject authority="lcsh"><mods:topic>Children</mods:topic><mods:topic>Conduct of life</mods:topic><mods:genre>Juvenile fiction</mods:genre>
</mods:subject>
<mods:subject authority="lcsh"><mods:temporal>1890</mods:temporal><mods:genre>Children's stories</mods:genre>
</mods:subject>
Subject Encoding Comparison
5 Little Pigs
MARC
650 0 |a Children
|x Conduct of life
|v Juvenile fiction.
655 0 |a Children's stories
|y 1890.
Subject Encoding Comparison
5 Little Pigs
Dublin Core
<dc:subject>Children -- Conduct of life -- Juvenile fiction
</dc:subject>
<dc:subject>Children's stories -- 1890
</dc:subject>
Subject Encoding Comparison
5 Little Pigs
Dublin Core
<dc:subject>Children -- Conduct of life -- Juvenile fiction
</dc:subject>
<dc:subject>Children's stories -- 1890
</dc:subject>
650 4 |a Children -- Conduct of life -- Juvenile fiction
650 4 |a Children's stories -- 1890
Subject Encoding Comparison
• MARC Comparison
• From MODS
• From Dublin Core
650 4 |a Children -- Conduct of life -- Juvenile fiction
650 4 |a Children's stories -- 1890
650 0 |a Children
|x Conduct of life
|v Juvenile fiction.
655 0 |a Children's stories
|y 1890.
Subject Encoding Comparison
• In practice
• Faceted Searching
• Citation Searching
Lessons / Standards Conclusions
• Try not to tie your metadata to your system
• Archival vs. Service
• Don’t be afraid of extending for your own needs, but
continue to follow the standards
• Don’t cripple your metadata / Prepare for the most
information you would want
• METS / MODS v. simple Dublin Core