UCSD Digital Library Program Working Group February 6, 2002 METS: Metadata Encoding & Transmission...

78
UCSD Digital Library Program Working Group February 6, 2002 METS: Metadata Encoding & Transmission Standard
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    1

Transcript of UCSD Digital Library Program Working Group February 6, 2002 METS: Metadata Encoding & Transmission...

UCSD Digital Library Program Working Group February 6, 2002

METS: Metadata Encoding & Transmission Standard

UCSD Digital Library Program Working Group February 6, 2002

Part One: Problem definition

UCSD Digital Library Program Working Group February 6, 2002

Digital (Library) Objects

• Reformatted to digital• scanned photographs, books and journals• digitized audio/video files

• “Born digital”• TEI-encoded texts• digital images, audio, video files• GIS, statistical datasets• interactive content

UCSD Digital Library Program Working Group February 6, 2002

Digital (Library) Objects

• Simple– single files, e.g.

• visual TIFF images• MP3 files• TEI-encoded text

– objects stand alone • no relationships to other objects

UCSD Digital Library Program Working Group February 6, 2002

Digital (Library) Objects

• Complex– multiple related files, e.g.

• page images from books or articles• multiple channels in digital audio files• related sound and text files (multimedia)• statistical dataset and codebook

– objects cannot stand alone• one or more related files required to

interpret the object

– requires structural metadata to model

UCSD Digital Library Program Working Group February 6, 2002

Structural metadata

• Maps physical files (digital assets) to logical items (complex digital objects)

• Examples– Scanned print material

• complex publication structures (e.g. journals runs)

• ordered relationship between digital page images

– A/V material• multiple resolutions of an image• multiple channels of an audio file

UCSD Digital Library Program Working Group February 6, 2002

Structural metadata

• Examples, continued– Multimedia presentations

• relationship between images, text, sound, video, etc. (time-based or other)

– Web sites• linkages between web pages• sitemaps

– Databases• table models and ER diagrams

UCSD Digital Library Program Working Group February 6, 2002

Digital (Library) Objects

• Also have other (non-structural) metadata– descriptive

• MARC, DC, FGDC, VRA core, other ontologies

– administrative• rights, provenance

– technical• format details, OAIS “representation

information”

• Standards exist or emerging for these

UCSD Digital Library Program Working Group February 6, 2002

Part Two: Introduction to METS

UCSD Digital Library Program Working Group February 6, 2002

METS Scope

• Supports– Structural metadata

• complex reformatted or born digital objects

– Metadata wrapper framework• descriptive, administrative, structural, etc.• structural required• others use namespaces to reference

“extension schemas”

UCSD Digital Library Program Working Group February 6, 2002

Evolved from MOA2

• Making Of America II project– Developed November 1997-January

2000– Funded by DLF and NEH, participants

• Cornell, NYPL, Penn State, Stanford, Berkeley

– Designed for scanned archival collections

– XML DTD defining explicit descriptive, administrative and structural metadata

UCSD Digital Library Program Working Group February 6, 2002

Evolved from MOA2

• February 2001 DLF workshop on structural metadata– Harvard, LC, MOA2 participants, others

• Outcome: METS definition– emphasis on structural metadata– wider scope of participants, content

types– change to XML schema, framework

architecture

UCSD Digital Library Program Working Group February 6, 2002

METS Header

Administrativemetadata

FileInventory

Structuremap

Descriptivemetadata

Behavioralmetadata

METS metadata “buckets”

optional

optional

optional required

optional optional

UCSD Digital Library Program Working Group February 6, 2002

METS metadata

• XML “extension schemas”– descriptive metadata

• Dublin Core, MARC, FGDC, VRA, etc.• Berkeley’s GDM schema (from MOA2)

– administrative/technical metadata• NISO image technical metadata• LC schemas for A/V technical metadata• Rights metadata (e.g. PRISM, XrML, etc.)• Provenance metadata

UCSD Digital Library Program Working Group February 6, 2002

M etad a ta R e fe ren ce M etad a ta W rap p er

D esc rip tive M etad a ta

Metadata Reference (mdRef): A link to external descriptive metadata. The type of link (URN/Handle/etc.)is included as an attribute, as is the metadata type.

Metadata Wrapper (mdWrap): Included descriptive metadata, as either binary data (Base64 encoded) or arbitrary XML using namespace mechanism. The metadata type is specified as an attribute.

METS Descriptive Metadata Section

UCSD Digital Library Program Working Group February 6, 2002

Tech n ica lM etad a ta

IP R ig h tsM etad a ta

S ou rceM etad a ta

P reserva tionM etad a ta

A d m in is tra tiveM etad a ta

Technical Metadata (techMD): technical metadata regarding content files

IP Rights Metadata (rightsMD): rights metadata regarding content files or primary source material

Source Metadata (sourceMD): provenance information for content files.

Preservation Metadata (preservationMD): metadata to assist in preservation of digital content

All sections use generic metadata reference and wrapper subelements.

METS Administrative Metadata Section

UCSD Digital Library Program Working Group February 6, 2002

e tc ., e tc ., e tc .

F ile G rou p F ile

F ile G rou p F ile

F ile In ven to ry(F ile G rou p )

File Group (fileGrp): provides mechanism for hierarchically subdividing physical files, for example by type

File (file): provides a pointer to an external file (Flocat) or includes file content internally (Fcontent) in Base64 encoding

METS File Inventory

UCSD Digital Library Program Working Group February 6, 2002

etc ., e tc . e tc ....

D ivis ion M E TS P o in te r F ile P o in te r

D ivis ion M E TS P o in te r F ile P o in te r

D ivis ion

S tru c tu ra l M ap

The Structural Map provides a tree structure describing the original document. Each division (div) element is a node in that tree, and can identify content files associated with that division by a METS Pointer (mptr) or a File Pointer (fptr)

METS Structural Map

UCSD Digital Library Program Working Group February 6, 2002

METS Pointer and File Pointer

METS Pointer (mptr): xlink to another METS file containing the content for the associated div. Useful for breaking up large objects (e.g., a journal run) into a series of smaller METS documents.

File Pointer (fptr): Identifies one or more entries in the File Inventory section containing the content for the associated div element. Can also limit the link from a div element to a portion of a content file (e.g., a segment of an audio or video file, a subarea of an image or video file, etc.).

UCSD Digital Library Program Working Group February 6, 2002

A rea A rea . . .

P ara lle l F iles

A rea A rea . . .

S eq u en tia l F iles

F ile P o in te r

File Pointer (fptr): Can identify a single file in File Inventory using ID/IDREF linking

Parallel/Sequential(par/seq): Allows a div to be associated with several content files that should be played/displayed in parallel (video with separate audio track file) or sequentially.

Area (area): identifiers a point, linear segment, or 2D area within content file that corresponds with associated div element.

METS File Pointer Mechanisms

UCSD Digital Library Program Working Group February 6, 2002

METS Area Element Attributes

FILE: ID for File element in File InventorySHAPE: As in HTML Area elementCOORDS: As in HTML Area elementBEGIN: A start point within a file for defining

a segmentEND: An end point within a file for defining

a segmentBETYPE: Begin/End type: IDREF, Byte Offset,

or SMPTE time codeEXTENT: Length Duration of SegmentEXTYPE: Extent Type: Bytes, or SMPTE

UCSD Digital Library Program Working Group February 6, 2002

Structure Example

<file ID=“f1” MIMETYPE=“audio/x-wav” SEQ=“1”><Flocat LOCTYPE=“URN”>

urn:x-nyu:violet42</Flocat>

</file><div N=“5” LABEL=“Question 5”>

<fptr><seq>

<area FILE=“f1” BEGIN=00:23:17:00 END=“00:23:38:00” BETYPE=“SMPTE”>

</area><seq>

</fptr></div>

UCSD Digital Library Program Working Group February 6, 2002

• Created for multimedia structural encoding

• SMIL has “time-based” orientation – for playing multimedia presentations

• Very complex• May eventually be incorporated

Related standards: SMIL (W3C), MPEG-7 (ISO)

UCSD Digital Library Program Working Group February 6, 2002

Related standards: RDF (W3C)• Also metadata wrapper framework• Structural metadata could be

supported, but doesn’t specify how…

• Opaque to use• No element semantics provided• element names deliberately meaningless

• Originally intended for descriptive metadata

UCSD Digital Library Program Working Group February 6, 2002

Related standards: OAIS framework

UCSD Digital Library Program Working Group February 6, 2002

METS and OAIS framework

• Submission Information Package (SIP)• METS as transfer syntax

• Dissemination Information Package (DIP)

• METS as transfer syntax• METS as input to display applications

• Archival Information Package (AIP)• METS stored internally in an archive

UCSD Digital Library Program Working Group February 6, 2002

Part Three: Library Applications of METS

UCSD Digital Library Program Working Group February 6, 2002

Library Applications

• Digital Object transfer syntax– between systems

• enables interoperability

– between institutions• enables collection sharing

– implements OAIS SIP/DIP/AIP

UCSD Digital Library Program Working Group February 6, 2002

Library Applications

• Input to Digital Object delivery systems (aka “disseminators”)– Simple bit-streaming– XSL stylesheet– Custom program for complex digital

object display

UCSD Digital Library Program Working Group February 6, 2002

Harvard’s Page Delivery Service (PDS)

• Range of publication types supported– 0-4 levels of hierarchy

• simple 3 page letter, 20 page article• diary with entries• book containing chapters containing sections• report run containing reports containing

sections• journal bound in volumes containing issues

containing articles

• Implemented as METS “tree”• example on METS web site

UCSD Digital Library Program Working Group February 6, 2002

Harvard’s PDS

Letter Citation leveland Leaf level METS

TIFF TIFF TIFF TIFF TIFF

UCSD Digital Library Program Working Group February 6, 2002

Harvard’s PDS

Diary

Entry Entry Entry

Citation level METS

Leaf levelMETS

TIFF TIFF TIFF TIFFTIFFTIFF TIFFTIFF

Entry

UCSD Digital Library Program Working Group February 6, 2002

Harvard’s PDS Journal

Volume Volume Volume

Issue Issue Issue Issue Issue Issue

Article Article Article Article Article Article Article Article

TIFF

TIFFTIFF

TIFF

TIFFTIFF

TIFF

TIFFTIFF

TIFF

TIFFTIFF

TIFF

TIFFTIFF

TIFF

TIFFTIFF

TIFF

TIFFTIFF

TIFF

TIFFTIFF

Citation level METS

Intermediatelevel METS

Leaf level METS

UCSD Digital Library Program Working Group February 6, 2002

Harvard’s PDS

• “Page turner” system– implemented as a web application

• java servlet, SAX parser

– minimal descriptive metadata • display only (not for discovery)

– no administrative metadata– file inventory only for “leaf” nodes

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

Harvard’s PDS

• METS maintenance system – implemented as a web applications

• java servlet, DOM parser

– supports structure updates• add a missing volume to a run• add a missing page to a scanned

manuscript• switch two page images

– supports cascading deletes• entire logical object including all

underlying digital assets

UCSD Digital Library Program Working Group February 6, 2002

Harvard’s E-Journal Archive

• Capture e-journals of three scholarly journal publishers – Wiley, Blackwell, University of Chicago

Press

• Accept normative data formats– descriptive, administrative metadata– article text, images, figures, etc.– reference links– other supplementary material

UCSD Digital Library Program Working Group February 6, 2002

Harvard’s E-Journal Archive

• OAIS Submission Information Package– received from publishers for each

journal issue and article, along with digital content files

• OAIS Archival Information Package– stored in Digital Repository Service

• OAIS Dissemination Information Package– delivered to subscribers on demand

UCSD Digital Library Program Working Group February 6, 2002

Harvard’s E-Journal Archive

• Issue-level metadata includes– METS header– descriptive (i.e.bibliographic) metadata– administrative (e.g. rights, provenance,

technical) metadata– structural metadata

• issue-level content– masthead, editorial board, etc.

• issue content– articles, correspondence, reviews, editorials,

errata, etc.

UCSD Digital Library Program Working Group February 6, 2002

OAIS

• Article-level metadata– METS header– descriptive (i.e. bibliographic)

metadata– administrative (e.g. rights,

provenance, technical) metadata– structural metadata

• article content– xml-encoded text plus images, figures, links,

etc.– and/or PDF

UCSD Digital Library Program Working Group February 6, 2002

Example Issue SIP<?xml version=”1.0” encoding=”UTF-8” standalone=”no”?><mets xmlns=”http://www.loc.gov/METS/” xmlns:ejar=”http://hul.harvard.edu/EJAR/METADATA/” xmlns:xsi=”http://www.w3.org/2001/XMLSchema” xsi:schemaLocation=”http://www.loc.gov/METS http://www.loc.gov/standards/mets/mets.xsd” xmlns:xlink=”http://www.w3.org/1999/xlink” TYPE=”EJARISSUE-major.minor” OBJID=”issueid” LABEL=”issue bibliographic citation” PROFILE=”EJAR”>

<metsHdr CREATEDATE=”yyyy-mm-dd”> <agent ROLE=”CREATOR” TYPE=”ORGANIZATION”> <name>content provider</name> </agent> </metsHdr>

<dmdSec ID=”descr:issue”> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:descr type=”issue”>issue descriptive metadata</ejar:descr/> </mdWrap> </dmdSec>

UCSD Digital Library Program Working Group February 6, 2002

Example Issue SIP <admSec ID=”admin:issue”> <rightsMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:copyright>issue copyright metadata</ejar:copyright> </mdWrap> </rightsMD> </admSec>

<admSec ID=”admin:issue-content”> <techMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:tech type=”TEXT”>issue content technical metadata</ejar:tech> </mdWrap> </techMD> <digiprovMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:checksum type=”MD5”>content file checksum</ejar:checksum> </mdWrap> </digiprovMD> </admSec>

UCSD Digital Library Program Working Group February 6, 2002

Example Issue SIP <admSec ID=”admin:1”> <techMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”NISOIMG”> <niso:...>cover image technical metadata</niso:...> </mdWrap> </techMD> <rightsMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:copyright>cover image copyright metadata</ejar:copyright> </mdWrap> </rightsMD> <digiprovMD> <mdWrap MIMETYPE=”text/xml” MDTYPE=”OTHER” OTHERMDTYPE=”EJAR”> <ejar:checksum type=”MD5”>cover image checksum</ejar:checksum> </mdWrap> </digiprovMD> </admSec>

UCSD Digital Library Program Working Group February 6, 2002

Example Issue SIP

<fileSec> <fileGrp ADMID=”admin:issue”> <file ID=”file:issue-content” ADMID=”admin:issue-content” CREATED=”yyyy-mm-dd” MIMETYPE=”text/xml” OWNERID=”id”

SIZE=”n”> <Flocat xlink:type=”simple” xlink:href=”issue.xml”/> </file> <file ID=”file:1” ADMID=”admin:1” CREATED=”yyyy-mm-dd” MIMETYPE=”image/tiff” OWNERID=”id” SIZE=”n”> <Flocat xlink:type=”simple” xlink:href=”cover.tif”/> </file>

... </fileGrp> </fileSec>

UCSD Digital Library Program Working Group February 6, 2002

Example Issue SIP

<structMap TYPE=”LOGICAL”> <div TYPE=”EJARISSUE” ADMID=”admin:issue” DMD=”descr:issue” LABEL=”issue bibliographic citation”> <fptr FILEID=”file:issue-content”/> <fptr FILEID=”file:1”/>

<div TYPE=”EJARSECTION” LABEL=”section label” ORDER=”n”> <div TYPE=”EJARITEM” LABEL=”item bibliographic citation” ORDERLABEL=”n”> <mptr xlink:type=”simple” xlink:href=”itemid1/item-md.xml”/> </div>

... </div>

... </div> </structMap></mets>

UCSD Digital Library Program Working Group February 6, 2002

GenDL (Generic Digital Library

• Focus of METS-based tools– Specify how files and parts of files fit together– Coordinate external and internal descriptive

and administrative metadata with object structure

– Mitigate complexity of METS for users

• Efficiency and coherence through standardization. – Automatic generation of digital objects– Presentation of disparate digital material

through coherent tools

UCSD Digital Library Program Working Group February 6, 2002

METS tools at UC Berkeley

• GenDB: Generic database to capture structural, descriptive and administrative metadata for digital reformatting projects

• GenX: Java program to extract metadata from GenDB database and package it up into METS

• GenView: Java programs for end user navigation of METS objects

• GenRep: Repository for METS objects

UCSD Digital Library Program Working Group February 6, 2002

Database(SQL Server)

Digital ObjectRepository

(Unix file system)

Gathering Metadata: GenDB

Viewing METS Objects: GenView

GenDBClient

(browser/servlet)

GenDBDatabase

Server

CreatingMETS

Objects:GenX

GenXMETS

Generator

GenViewClient

(browser/servlet)

GenViewRepository

Server

UCSD Digital Library Program Working Group February 6, 2002

GenDB

• Tool to capture structural, descriptive and administrative metadata

• First implemented as an MS Access DB

• Now implemented as a SQL server with web front end

• Java client?

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

GenDB Key Features

• Exposes Digital Object’s structure– UI enables easy visualization to build object

structure

• Highly configurable– Project manager specifies what fields should appear

and how they should be tagged

• Layered architecture enhances flexibility– UI doesn’t know underlying DB table structure– Different UIs can be layered over same middle layer

UCSD Digital Library Program Working Group February 6, 2002

GenView

• Tool to view and navigate METS objects

• Web-based user interface (Java)

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

UCSD Digital Library Program Working Group February 6, 2002

GenView: Key Features

• Exposes Digital Object’s structure– Table of Contents for navigation– Select from multiple manifestations of

currently selected TOC entry (including side by side display)

– Link to descriptive/administrative metadata for • highest-level object• currently selected TOC entry

• Supports non-Roman text (beyond ISO-8859)

UCSD Digital Library Program Working Group February 6, 2002

Part Four: METS Summary

UCSD Digital Library Program Working Group February 6, 2002

METS summary

• Descriptive/technical/administrative metadata– not defined internally– points to external standard schemas

• Dublin Core, MARC, MPEG-7, etc.• AES audio metadata

– set of “best practice” schemas being identified

UCSD Digital Library Program Working Group February 6, 2002

METS summary

• Structural metadata– defined internally and required– SMIL-lite

• simple support for multimedia, audio/visual

• SMIL may replace eventually

UCSD Digital Library Program Working Group February 6, 2002

METS summary

• Current users include• UC Berkeley (archival collections)• Harvard (scanned print publications, e-

journals)• Library of Congress (audio/visual

collections)• EU MetaE project (historic newspapers)• Michigan State (oral history collections)• Univ of Virginia (FEDORA digital objects)• National Library of Australia• more daily...

UCSD Digital Library Program Working Group February 6, 2002

METS summary

• Tools under development for– metadata capture– transformation– transfer– dissemination/display

• Profiles necessary for interoperation– Which extension schemas used?– How structure maps are organized…

UCSD Digital Library Program Working Group February 6, 2002

METS summary

• Current status– version 1.0 due out in February– editorial board being set up– LC standards office for maintenance

agency– DLF and RLG underwriting

• RLG will host editorial board, offer documentation and training, develop tools, seek funding

UCSD Digital Library Program Working Group February 6, 2002

METS summary

• METS is not all things to all people…– Designed for local institutional application

support• Solving an immediate local problem• Common to many institutions• Flexible framework supports many institutional

situations

– Profiling necessary to interoperate• For OAIS packages• For shared tools• For other kinds of interoperation (e.g. cross

repository search)