A Logical Model for Digital Archives

Post on 13-Feb-2016

30 views 0 download

Tags:

description

Draft document 0.1. A Logical Model for Digital Archives. Rathachai Chawuthai rathachai.chawuthai@live.com . Information Management CSIM / AIT. Agenda. Introduction Digital Preservation Underlying Community Knowledge Logical Model Prototype Related works. Introduction. Motivation. - PowerPoint PPT Presentation

Transcript of A Logical Model for Digital Archives

A Logical Modelfor Digital Archives

Rathachai Chawuthairathachai.chawuthai@live.com

Information ManagementCSIM / AIT

Draft document 0.1

2

Agenda

• Introduction• Digital Preservation• Underlying Community

Knowledge• Logical Model• Prototype• Related works

3

Introduction

4

• Our valued digital information in the present may not be accessible or rendered originally in next 100 years.– Technological Obsolescence– Deterioration of digital storage media

• A reader in next 100 years may not understand our today digital information as same as author’s purpose.– Author and reader do not have same context knowledge– Changing of contextual knowledge over the time

• It could have the common knowledge somewhere that every local knowledge refer to.

Motivation

Yuan Li (2011), Flouris (2007)

5

User are to be able to access and understand

digital information in the future

SDA 2011 at Berlin

6

• To develop a theory for digital archive• To design an information model representing

contextual knowledge• To explore knowledge by linking archives

across communities ???????• To develop a prototype system in order to

test the theory

Objectives

7

• Do a theory by extending the existing theory of Flouris “Steps towards a theory of information preservation” (Underlying Community Knowledge)

• Design “Context Model” of“Underlying Common Community Knowledge”– Use linked metadata to model contextual knowledge– Refer to OAIS information model– Integrate with PREMIS metadata

• Build an archival system– Refer to OAIS guideline– Integrated with Fedora-Commons as a back-end service

Scopes

8

Digital Preservation

9

Example in 22nd Century

What is ? Error: DVDunreadable

Error: No program can open file format .doc

!7rò??àÕ ??ߟ²ÂÚÕ??ߟ²ÂÚ

ðŽɳ!Z?g! Õr/ÕŸ/?rò?

File is read protectedPlease key password

10

• Digital preservation is an active management of digital information to endure its accessibility over the time.

• Digital preservation types– Bit Preservation

Ability to produce a particular sequence of bits from storage media at any time.

– Data PreservationAbility to rendered the produced bit stream and produce a meaningful output from it at any time.

– Information PreservationAbility to understand the rendered digital object at any time

Overview

Flouris (2007)

11

• Preservation policy– To use well-known file format,

such as, .pdf, .xml, .tiff, .jpg, .avi, and etc• Preservation strategies

– Secure storage system, Software migration, Emulation, Media refreshment, and Disaster planning.

• Content policy– Track user activities, such as, ingest, migration, and etc.– Peer review be for deposit into repository

• Right and agreement– Because some preservation activities need to duplicate and modify

digital content, it needs to record right and agreement to digital object.

Recommendation

Yuan Li (2011)

12

OAIS

OCLC.org

ContentInformation

PDIPreservationDescriptionInformation

Archive Packaging Information

DescriptiveInformation

about Package 1

Package 1

Information Model

13

OAIS

OCLC.org

DIP

AIP

SIP

Producer

Administrator

Ingest

Store

Query

Access

Disseminate Consumer

Workflow

Manage

14

• Provenance– Describe history of creation, ownership, access, and change

• Authenticity– Ensure trustworthiness (Does digital resource render originally?)

• Preservation activities– Record process supporting preservation, such as migration

• Technical environment– Provide name and version of hardware, platform, OS, and software that is

required to render digital resources• Rights management

– Inform concern of intellectual property rights and agreement that need to be observed when execute preservation process.E.g. does a creator allow to copy his/her work or not?

Preservation Metadata

OCLC.org, usenix.org

Basic features

15

PREMIS

PREMIS from LOC.gov

• Information providing to support preservation management– Technical information (Characteristics)

• E.g. creator, created date-time, file format, software/hardware environment, …

– Information about action of a digital object• E.g. ingest, migrate, verify, …

– Inhibitors• Password, encryption, … in order to access digital objects

– Digital Provenance• Record change of object format e.g. .DOC .PDF• Contain application, version, environment, … in order to render digital objects

– Significant Properties (If important)• Object’s characteristics e.g. font, formatting, color, …., etc• Look and feel

– Rights• E.g. Rights and agreement metadata associated with preservation

Overview

16

PREMIS

PREMIS from LOC.gov

Entities

17

Challenge

Flouris (2007)

ConceptualLevel

PhysicalLevel

• Data Preservation• Bit Preservation

•Information Preservation

18

Underlying Community Knowledge

19

• DC is a group of people who– Have common knowledge (concept)– Have common background– Have common contextual knowledge– Have same language

• Knowledge of DC called Underlying Community Knowledge (UCK)

Designated Community (DC)

Flouris (2007)

20

• UCK looks like: knowledge, background, context, commonsense, semantic, and etc. that are understandable by all people in DC

• It means that People in the same DC know the same UCK and understand every Concept inside UCK

Underlying Community Knowledge (UCK)

Flouris (2007)

21

Problem

Flouris (2007)

ConsumerProducer

First name = “Rathachai”Family name = “Chawuthai”

UCK 1 UCK 2

Name : “Rathachai Chawuthai”Write Read

First name = “Chawuthai”Family name = “Rathachai”

22

Approach

Flouris (2007)

ConsumerProducer

First name = “Rathachai”Family name = “Chawuthai”

UCK 1 UCK 2

Name : “Rathachai Chawuthai”Write

Delta

Read

First name = “Rathachai”Family name = “Chawuthai”

23

• Some Preliminary Ideas Towards a theory of digital preservation– Giorgos Flouris

Reference

TBD

24

Challenge

Name =

First name+

Last name

Name =

Family name+

First name

?

?

UCK

AUCK

B

25

Logical Model

26

• A model must:– Represent contextual knowledge – Be a reference for all underlying community knowledge as

a common knowledge– Identify associations and differentiates between common

knowledge and community knowledge– Identify associations and differentiates between

community knowledge– Capture change or evaluation of common knowledge itself– Be able to link concepts among designated community

based on common contextual knowledge

Goal

27

• Underlying Common Community Knowledge– A common contextual knowledge for all

underlying community knowledge

UCCK

28

UCCK

C R

HC

IC IR

AO

• C a set of concepts• R a set of Relations• HC a set of hierarchy of Classes• HR a set of hierarchy of Relations• IC a set of instances of C• IR a set of instances of R• A0 a set of Axiom (Inference relations of logic)

HR

Yildiz (2006)

29

UCCK

C R

HC

IC IR

AOHR

UCCKDer

ive DeriveUCK1 UCK2

30

UCCK

UCK1 UCK2

31

UCCK

UCK1 UCK2

UCCK

32

UCCK

UCK1 UCK2

UCCK

33

UCCK

Past Future

34

The Event Ontology

Raimodn (2007)

http://motools.sourceforge.net/event/event.html TBD

35

Prototype

36

As an Consumers

Archival Information

System

Consumers

AnotherArchival

Information System

AnotherArchival

Information System

Link Link

• Browse digital objects• Search relevance digital

objects across repositories• Link to other related

digital objects under contextual knowledge across systems

• Customize own designated community

37

As an Archivist

Archival Information

System

Archivist

• Ingest digital objects• Define links to other

objects• Add metadata according to

digital object’s type• Add underlying

community knowledge• Add contextual knowledge

38

As an Administrator

Archival Information

System

Administrator

• Define metadata for each type of digital object

• Define underlying common community knowledge

• Define underlying community knowledge

• Define designated communities

39

• Be able to manage variety types of digital objects• Be able to link digital object to other ones

semantically• Be able to provide context knowledge by linking

digital objects for each designated community• Be able to manage variety types of metadata• Be able to do semantic search• Be able to store knowledge as ontology

Requirements

40

• Repository system• Features

– Collect digital objects and their relations

– Collect metadata– Collect ontology– Support versioning

• Only one repository system that – Support Semantic Search– Provide Web Services

• Work as back-end services

Fedora-Commons

Duraspace.org

41

• Popular CMS• Features– Rich user management– Rich content management– Flexible for customized modules

• Only one CMS that – supports SPARQL endpoint

• Work as front-end service to end-user

Drupal

Drupal.org

42

• A Drupal’s module• Features

– Provide administration panel– Provide fast-search to Fedora database– Support many formats of metadata– Support many types of digital objects

• Only one Drupal’s module that: – Integrate with Fedora-Commons– Works with GSearch service (Semantic

Search of Fedora-Commons)• Work as front-end administration

services

Islandora

Islandora.ca

43

System architectures

ConsumersAdministrator Archivist

Islandora Other content modules

Drupal

AdministrationServices

Fedora Core Service GSearchGeneric Search SOLR

Database

44

• To find Architecture, like, Hitest’s diagram

Reference

TBD

45

Related works

46

• Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval – Is an Integrated Project co-financed by the European Union within the

Sixth Framework Programme– Add context knowledge to digital object following its characteristics and

representations• Similarity

– Integrate context knowledge of digital objects and estimate gap of designated communities’ knowledge with semantic technology

• Advantage of my project– Explore knowledge by linking archive across designated communities

referring to underlying common community knowledge– Emphasize changing common community knowledge over the time

CASPAR

Casparpreserves.eu

47

• Sustaining Heritage Access through Multivalent Archiving– Is an Integrated Project co-financed by the European Union within the

Seventh Framework Programme– Represent context as relations between digital objects– Integrate context information by processes, such as, ingested, accessed, and

reused with ontological representation• Similarity

– Represent context information by linking digital objects and other things semantically based on document processes

• Advantage of my project– Explore knowledge by linking to other digital objects and other things

semantically referring to underlying common community knowledge capturing knowledge from real-world concept (rather than document processes)

SHAMAN

Reference

48

?

49

References

• CASPAR: Cultural, artistic and scientific knowledge for preservation, access an retrieval. eu funded project (fp6-2005-ist-033572). http://www.casparpreserves.eu

• http://public.ccsds.org/publications/archive/650x0b1.PDF • http://www.loc.gov/standards/premis/ • http://www.drupal.org• http://www.duraspace.org/• http://islandora.ca