PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San...
-
Upload
leo-hoover -
Category
Documents
-
view
215 -
download
0
Transcript of PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San...
PREMIS at the British Library
Markus Enders, The British Library
PREMIS Implementation Fair, San Fransisco, CA
07 October 2009
2
General
Archival Information Package (AIP) AIP is just a conceptual entity Conceptual (generic) data model Content files stored on write once media Content files may be containerized (stored in ZIP or WARC
files)One or more containers per AIP; files in containers may belong to various AIPs
AIP Descriptor: METS file describes the content of the AIPstructure, files, descriptive metadata, preservation metadata
Different METS profiles for different content streamseJournals, newspapers (born digital and digitized), web archiving
Common underlying document model for all AIPs
3
METS Descriptor
What is stored in the METS Descriptor? Structure of the document (logical and physical in different
structMaps)Not all content streams have two structMaps (born digital streams have only on)
Descriptive metadata File Section
Defines container files as well as content files (nested <file> elements)
4
METS Descriptor
What is stored in the METS Descriptor? Structure of the document (logical and physical in different
structMaps)Not all content streams have two structMaps (born digital streams
Descriptive metadata File Section
Defines container files as well as content files (nested <file> elements)
Preservation metadataPreservation metadata for files and representations
5
METS Descriptor
What is stored in the METS Descriptor? Preservation metadata:
Preservation metadata for files and representations
Focusses on: Audit trail – events and agents Technical metadata – basic technical metadata in METS
and PREMIS Assumption: future migrations of files necessary
No emulation considered; no environment information stored
<mets:file> elements <mets:div> elements
6
Preservation Metadata (PREMIS)in METS
Content streams: eJournals
uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output
Newspapersuses PREMIS 2.0; MODS 3.3; METS 1.8
Web Archivinguses PREMIS 2.0; MODS 3.3; DC; METS 1.8
7
Preservation Metadata (PREMIS)eJournal content stream
Content streams: eJournals
uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output
AIP model: One AIP per article, issue, journal, digital manifestation
Any changes will lead to a new AIP; old version of AIP is referenced
8
Preservation Metadata (PREMIS)eJournal content stream
Content streams: eJournals
uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output
AIP model: One AIP per article, issue, journal, digital manifestation
Journal, Issue, Article: AIP consists just of a METS descriptor (mainly descriptive metadata (MODS) embedded and preservation metadata:
PREMIS: regarded as representations of intellectual entities Relationships between representations are recorded in MODS record
9
Preservation Metadata (PREMIS)eJournal content stream
Content streams: eJournals
uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove dtd
AIP model: One AIP per article, issue, journal, manifestation
Digital Manifestation: AIP consists of content files and METS descriptor. METS descriptor contains PREMIS records for files and one for the Digital Manifestation itself
Relationships to article recorded in PREMIS record (manifestationOf) Relationships to submission is recorded in PREMIS
(containedInSubmission)
Submission: received content files in ZIP (one AIP)
10
Preservation Metadata (PREMIS) and METS:eJournal content stream
Content streams: eJournals
uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output
amdSec: one amdSec per PREMIS record; referenced from <mets:file> and
<mets:div> elements Use of <premis:object>; <premis:agent>; <premis:event> elements
techMD: Extracted data from Jhove (files) PREMIS record of a file
digiprovMD: PREMIS record of representations (journal, issue, article) PREMIS record of a file
11
Preservation Metadata (PREMIS) and METS:eJournal content stream
Content streams: eJournals
uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output
PREMIS elements used: objectIdentifier objectCategory preservationLevel size fixity (MD5, SHA-512) format (PRONOM) Relationships, events and agents where necessary
12
Preservation Metadata (PREMIS) and METS:eJournal content stream
Content streams: eJournals
uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output
PREMIS elements used: objectIdentifier objectCategory preservationLevel size fixity (MD5, SHA-512) format (PRONOM) Relationships, events and agents where necessary
Redundantly in METS <file> element}
13
Preservation Metadata (PREMIS):relationships
PREMIS relationships: manifestationOf (between Manifestation and Article) containedInSubmission (between Manifestation and
Submission)
PREMIS relationships (between files: m-n relationships): migration uncompression modification
Relationships are always stored in <digiProvMD> Premis records for files will have techMD and digiProvMD
14
Preservation Metadata (PREMIS):events
PREMIS events (on file level): integrityCheck formatIdentification validation wellformness propertyExtraction
PREMIS events (on representation level): metadataUpdate
Relationships are always stored in <digiProvMD> Premis records for files will have techMD and digiProvMD
15
Preservation Metadata (PREMIS):events
PREMIS events always have an agent
Event and agents are stored in each PREMIS record:
In case an event effects more than one object, it must be repeated in each object’s PREMIS record.
Using the same identifier indicating it is the same event.
16
Preservation Metadata (PREMIS)in METS
Content streams: eJournals
uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove dtd
Newspapersuses PREMIS 2.0; MODS 3.3; METS 1.8
Web Archivinguses PREMIS 2.0; MODS 3.3; DC; METS 1.8
• Move to PREMIS 2.0• Changes to AIP model
17
AIPs and PREMIS 2.0
Change of AIP: Newspapers need second structMap (and structLink)
Hierarchy of AIPs no longer possible Instead: one AIP per issue
Manifestations are modelled as a <fileGrp> (various manifestations per AIP possible)
Support of container files (ZIP, WARC) Modelled as nested <file> elements; no PREMIS record for
container files
No file format specific technical metadata is captured
18
METS and PREMIS 2.0
METS and PREMIS 2.0: Use of new METS schema versions:
<mets:mdWrap MDTYPE="PREMIS:OBJECT">
<premis:object xsi:type="premis:file"> instead of objectCategory
just use <digiProvMD> Agent, object, event in separate <digiProvMD> elements within
the same <amdSec> PREMIS record should be self containing
19
METS and PREMIS 2.0
Extended list of event types:
deselection: files which are defined in the AIP descriptor but never ingested (no FLocat element)
metadataExtraction vs. propertyExtraction
Extended list of relationship types (relationshipSubType):
modification vs. manipulation
20
METS and PREMIS 2.0
Extended list of event types:
deselection: files which are defined in the AIP descriptor but never ingested (no FLocat element)
metadataExtraction vs. propertyExtraction
Extended list of relationship types (relationshipSubType): modification vs. manipulation
21
METS and PREMIS 2.0
Problems:
Validation Using controlled vocabularies Considering dependencies between METS and PREMIS
Standardized workflow for creating METS and PREMIS for all content streams
Currently specific implementations for each content stream
Extending the AIP Model Preservation metadata for metadata records