Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered...

24
Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies, London

Transcript of Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered...

Page 1: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Introducing the ELAR information system

architecture

Robert Munro &David Nathan

Endangered Languages Archive (ELAR), School of Oriental and African Studies, London

Page 2: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Outline

1. Introduction

2. The ELAR architecture

3. User Requirements

4. Ingestion

5. Archive & dissemination

6. Conclusions

Page 3: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Introduction – who we are

Part of the Hans Rausing Endangered Languages Project (HRELP), based at the School of Oriental and African Studies (SOAS), University of London.

Funded by the Lisbet Rausing Charitable fundThe other two parts are:

Academic Programme (ELAP) runs postgraduate courses, seminars and workshops

Documentation Programme (ELDP) funds endangered language documentation projects

Page 4: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

ELAR – current state

In the process of designing and implementing key systems:accession system (ingestion system)archive information systemcatalogue serving systemarchive access systemdata storagelong-term backup system

Page 5: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

ELAR – current state

Source of materials supporting the systems analysis and design:literature reviewreview of exemplar materialsinteraction with associated archivesinteraction with ELDP granteesinteraction with members of ELAPdepartmental seminars on language documentationseminars focused on archiving

Page 6: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

ELAR – architecture

Strongly informed by the Open Archive Information System (OAIS) Reference Model (CCSDS, 2002)

Page 7: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

The OAIS model

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

Page 8: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

The OAIS model

Identify the nature of the materials (content, format and structures) that data producers will create

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

Page 9: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

The OAIS model

Identify the intended users of the archive, and their user requirements

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

Page 10: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

The OAIS model

Define dissemination formats, data structures and procedures that support the user requirements of the designated communities

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

Page 11: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

The OAIS model

Design an archive information system able to store all the information and produce the required dissemination packages.

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

Page 12: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

The OAIS model

Define ingestion (accession) formats and structures that minimise the conversion cost

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

Page 13: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

The OAIS model

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

The archive needs to define three types of ‘packages’: ingestion, archive and dissemination.

Page 14: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

User requirements

EL speakers and communities:

continuation of ownership of language and materials

depositors: preserve deposit structure; update material; be correctly attributed

researchers: search (broad, narrow, domain specific); add materials; add relationships

publisher– repurposing: obtain high quality data for repurposing

publisher– public heritage: archive to act as mediator

public: browse

long-term preserver: obtain clearly structured data

Page 15: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Ingestion

A set of formats & structures that can be converted to archive formats with minimal effort:file formats conforming to the 7 + 1 dimensions of

portability (Simons and Bird, 2003; Johnson 2004)support incremental assembly of the depositwell-documented structures: XML with schema ideal

ELAR preferences:uncompressed, nonpropriety formatswell-documented structures: (OLAC, IMDI, custom)

Page 16: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Ingestion

Filenames and structure of deposit:we convert deposits to formats / structures appropriate

for the archive information system…but, we record the filenames and directory structures

of the deposit, allowing depositors to navigate the materials via them

Page 17: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Ingestion

Access protocols… tomorrow

Page 18: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Archive and dissemination

Granularity:archive objects can be bundles archive objects can be a subsection of a filethe types of related materials and their relationships

should play a part in the search options

Page 19: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Archive and dissemination

Version control:modeling versions of materials are requiredmultiple types of versioning might be required

(migration / dissemination / content update)versions will be ‘invisible’ to most dissemination

packages

Page 20: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Archive and dissemination

Adding materials and metadata:users can add comments to datausers can add metadata values not provided by a

depositorusers can make relationships between items,

including mapping users can supplement the kinds of metadata and

relationships in the archive. note: all the above require moderation and supporting

architecture

Page 21: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Archive and dissemination

Language support:users should be able to add comments / metadata in

any languageusers should be able to navigate the archive access

system via the language preference(s) of their choicethe archive architecture needs to support translations

of metadata and comments

Page 22: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Archive and dissemination

Archive servicesadvice and conversion services to depositorsresponse to requests for informationsupporting communications between individuals

associated with the archive

Page 23: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Archive and dissemination

Archive information system:separate metadata from materialsavoid redundancy

Dissemination packages:favour embedding metadataredundancy ok if an aid interpretation

Technical solutions:we use MySQL to support the archivefor dissemination, we favour XML and formats

allowing metadata to be embedded (PDF, BWF)

Page 24: Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies,

Conclusions

ELAR is newly opened for depositsKey systems are in the process of developmentSignificant features include:

modelling archive objects at different granularitiesmodelling relationships between objectsusers can enter/define their own metadatausers can translate information into the language of

their choiceusers can navigate via the language(s) of choice