Putting together a METS profile. Questions to ask when setting down the METS path Should you design...

Post on 28-Mar-2015

225 views 1 download

Tags:

Transcript of Putting together a METS profile. Questions to ask when setting down the METS path Should you design...

Putting together a METS profile

Questions to ask when setting down the METS path

● Should you design your own profile?

● Should you use someone else’s off the peg?

● Should to adapt someone else’s?

Finding a pre-existing profile

● URI● Short Title● Abstract● Creation Date● Contact Information● Related Profiles● Extension Schema● Rules of Description● Controlled Vocabularies● Structural Requirements● Technical Requirements of Content, Behavior and

Metadata Files● Tools and Applications● Appendix: Example Document

What’s in a METS profile?

Currently registered profiles

● Oxford Digital Library METS Profile● UCB Imaged Object Profile● UCB Paged Text Object Profile● Model Imaged Object Profile● Model Paged Text Object Profile● The University of Waikato Digital Library

Group - Greenstone Project METS Profile [Draft]

● Library of Congress METS Profile for Audio Compact Discs

Putting together your own profile

● Descriptive metadata

● Administrative metadata

● File section

● Structural map

Descriptive metadata

● Embed within the METS file, or hold externally and reference from it?

● One metadata section or several?

● Which schemes?

● Which content rules to follow (AACR2, ISAD-G etc)?

Embed or reference?● Referencing

– Allows metadata not in XML to be used (as a last resort)

– Allows metadata files to be distributed and held anywhere (including different repositories)

– Means that when metadata is updated, only the referenced file is changed, not the METS file

● Embedding– Requires metadata to be in XML– Keeps everything in one place for easier archiving

(OAIS)– Prevents dead links– Allows easier processing

One metadata section?

● Multiple <dmdSec> sections are allowed in a METS file

● Possible uses of multiple sections:-– Multi-lingual objects, with descriptions in each

language in separate sections– Different schemes revealing different facets of the

object (iconography, intellectual content etc).– A simple main description and more detailed

supplementary descriptions

Which schemes to use?

● If possible, use schemes recommended by the METS Editorial Board (METS Extenders)

– Dublin Core

– MARCXML MARC 21 Schema (MARCXML)

– Metadata Object Description Schema (MODS)

Dublin Core

● 15 basic fields● Can be qualified● A set of suggested

qualifiers published by DC

● Problems:-– Unqualified DC too vague

for detailed descriptions– Qualifying DC reduces its

interoperability

TitleCreatorSubjectDescriptionPublisherContributorDateTypeFormatIdentifierSourceLanguage RelationRights

MARC-XML● A translation of

MARC to the XML schema format

● Can move losslessly from MARC to MARCXML and vice versa

MODS

● “Metadata Object Description Schema”● A subset of MARC intended particularly for

digital items● Richer than unqualified Dublin Core but more

interoperable● Easier for non-librarians than MARC-XML● Generally seen as a good compromise

solution for digital objects

Content rules

● To ensure interoperability, metadata content should be controlled if possible

● Some possibilities:-– AACR2, particularly if collection digitizes library

materials (allowing compatibility with OPAC)– LCNAF for name authorities– ISAD (G) for archival materials– National Council of Archives rules for name

authorities?

Administrative metadata

● Most of the same considerations apply to administrative as to descriptive metadata

– Embed within the METS file, or hold externally and reference from it?

– One metadata section or several?

– Which schemes?

Schemas for administrative metadata

● Still images– MIX: NISO Technical Metadata for Digital Still

Images Standards Committee

● Text– Schema for Technical Metadata for Text

● Video– VIDEOMD: Video Technical Metadata Extension

Schema

What files will you include in your <fileSec> and how will they be arranged?

● Archival images– Uncompressed TIFFs (colour or greyscale)– Group IV compressed bitonal TIFFs (bitonal)– Held on archival file server

● Deliverable images– JPEGS or GIFS– Possibly more than one to allow viewing at

differing resolutions

● Thumbnails– JPEGS or GIFS

How will you arrange your <structMap>?

● Probably no internal structure if each METS file contains metadata for a single image only

● Possibly treat METS file as holder for collection of images– Group into categories?– Work out a logical sequence

The file inventory <fileSec>

● Which files to include, and in what format?– Image files

● Archival format (TIFF)● Delivery format (JPEG)● Thumbnails (JPEG)

– Text● XML-marked up text (preferably in TEI)● Word files etc?

– AV materials● Video files (MPEG, MOV, WMV)● Sound files (WAV, MP3?)

The file inventory <fileSec>

● Embed or reference?– Content may be embedded within METS file (as

XML or Base 64 encoded data)– Embedding allows all data and metadata to be

held together for archival purposes, but files can be huge!

– Embedding is feasible with text, probably best avoided with image, sound, or video!

● How to organise them?– Group by referent?– Or by file type?

<fileSec>

fileSec fileGrp

file

file

file

FLocat

Grouping by referent● Each <fileGrp> element contains the files for

a given unit (page of a book, slide, section of video)

● Point at the <fileGrp> element from the <div> within the structural map corresponding to this unit

● Use the GROUPID attribute to differentiate between the types of file

<fileGrp ID="munahi010-aaa-fgrp-0001">

<file GROUPID="0" ID="munahi010-aaa-0001-0" MIMETYPE="image/tiff" ADMID="munahi010-aaa-tmd-0001-0"> <FLocat LOCTYPE="URL" xlink:href="file://hfs.ox.ac.uk/data/odl/munahi010/digObjects/aaa/0/munahi010-aaa-0001.tiff"/> </file>

<file GROUPID="6" ID="munahi010-aaa-0001-6" MIMETYPE="image/jpeg" ADMID="munahi010-aaa-tmd-0001-6"> <FLocat LOCTYPE="URL" xlink:href="http:odl/munahi010/digObjects/aaa/6/munahi010-aaa-0001-6.jpg"/> </file>

<file GROUPID="3" ID="munahi010-aaa-0001-3" MIMETYPE="image/jpeg" ADMID="munahi010-aaa-tmd-0001-3"> <FLocat LOCTYPE="URL" xlink:href="http:odl/munahi010/digObjects/aaa/3/munahi010-aaa-0001-3.jpg"/> </file>

</fileGrp>

Grouping by file type

● All files of the same type are listed under the same <fileGrp>, eg.– All archival images– All delivery images– All thumbnails

● The GROUPID attribute is used to indicate the referent (eg. page) of each image

● Each file is referenced separately in the structural map

<mets:fileGrp USE="archive image">

<mets:file ID="FID1" MIMETYPE="image/tiff" GROUPID="GID1"><mets:FLocat xlink:href="bkm00002773a.tif" LOCTYPE="URL"/>

</mets:file>

<mets:file ID="FID2" MIMETYPE="image/tiff" GROUPID="GID2"><mets:FLocat xlink:href="bkm00002774a.tif" LOCTYPE="URL"/>

</mets:file>

</mets:fileGrp>

<mets:div ORDER="1" TYPE="page" LABEL=" Page [1]"><mets:fptr FILEID="FID1"/><mets:fptr FILEID="FID3"/><mets:fptr FILEID="FID5"/>

</mets:div>

Organising the structural map● Need to work out how users will want to

browse through item and design structure accordingly

– Images – should these be put into a sequence or collated into collections?

– Book -> chapters -> sub-chapters -> page

– Video -> sections -> segments -> timecodes

One structural map or many?

● Do you need separate hierarchies?– eg. Physical vs logical hierarchies

● Usually one <structMap> is sufficient if hierarchies nest neatly

● If more than one hierarchy is used, how are they linked together?

Coming next…