The DM2E Data Model and the DM2E Ingestion Infrastructure

27
co-funded by the European Union Work Package 2 All WP Meeting 11th June 2013, London Kai Eckert, Evelyn Dröge

description

Presentation given at the All WP meeting of the project Digitised Manuscripts to Europeana (DM2E), June 11th 2013, London.

Transcript of The DM2E Data Model and the DM2E Ingestion Infrastructure

Page 1: The DM2E Data Model and the DM2E Ingestion Infrastructure

co-funded by the European Union

Work Package 2

All WP Meeting 11th June 2013, London

Kai Eckert, Evelyn Dröge

Page 2: The DM2E Data Model and the DM2E Ingestion Infrastructure

Timetable

16.04.2013 DM2E Review: Work Package 2 2

Q1 •2.1, 2.3: Test of external components (MINT, Silk, jMet2Ont, D2R), Data Survey

•2.2: Basic entity structure of the model

Q2 •2.1, 2.4, 2.5: Prototype of Infrastructure, Workflows, UI

•2.2: Mapping workshops, further work on the DM2E data model

Q3 •2.1: 2.4, 2.5: Softwaredesign of Intermediate Architecture, Foundational work

•2.2: Combination of the DM2E model with Linked Data principles, Integration with Architecture

Q4 •2.1, 2.4, 2.5: Provenance Model for DM2E Infrastructure, Web service development

•2.2: DM2E Data Model 1.0, first stable and operational version.

Q5 • Implementation of the DM2E Data Model in Mint.

•Development of the Intermediate Version of the Infrastructure: due on July 15th.

Page 3: The DM2E Data Model and the DM2E Ingestion Infrastructure

DM2E Data Model 1.0 published

Online Documentation: onto.dm2e.eu/dm2e

Page 4: The DM2E Data Model and the DM2E Ingestion Infrastructure

Further Documentation

Current Version: DM2E Model v1.0

• Documentation: http://dm2e.eu/document/

Model description

OWL File

• New changes in the model are shown in a next model version 1.1 on Redmine: https://dm2e.hu-berlin.de/redmine/projects/wp2/wiki/DM2E

Page 5: The DM2E Data Model and the DM2E Ingestion Infrastructure

Modelling Issues

• Namespaces: – dm2e: <http://onto.dm2e.eu/schemas/dm2e/1.0/> .

– dm2edata: <http://data.dm2e.eu/data/> .

• How to reuse external vocabularies? – Specification: DM2E scope notes and original description of terms

from other vocabularies

• Modelling analogue to EDM (OAI-ORE) – Extensive use of properties instead of classes,

e.g., 52 new properties for edm:ProvidedCHO

– If possible: direct reuse of external vocabularies

• Additional external vocabularies – Korbo (WP3), Bibliographic Ontology, FaBiO, Publishing Roles

Ontology, VIVO Ontology, VoID

16.04.2013 DM2E Review: Work Package 2 5

Page 6: The DM2E Data Model and the DM2E Ingestion Infrastructure

DM2E Model: Class Extension

• New subclasses

edm:NonInfor mationResource

edm:Place edm:PhysicalThing

bibo:Book

dm2e:Manuscript

fabio:Page

edm:Event skos:Concept

fabio:Chapter

dm2e:Work

edm:TimeSpan edm:Agent

foaf:Organization

foaf:Person

Example: Integration of new classes

16.04.2013 DM2E Review: Work Package 2 6

Page 7: The DM2E Data Model and the DM2E Ingestion Infrastructure

DM2E v1.0 in Mint

Different DM2E model interpretations in Mint

Evaluation of those in three evaluation rounds

Page 8: The DM2E Data Model and the DM2E Ingestion Infrastructure

Evaluation of the DM2E model interpretations

• 1st Evaluation – UBER

– DM2E v1.0 - EDM Schema Approach

- Resources are not related

- Ranges are not taken into account

Schema is not further used

– DM2E v1.0 Fixed Ranges + DM2E Schema Approach

+ Resources are related

+ Ranges are considered

Basis for the 2nd evaluation

Page 9: The DM2E Data Model and the DM2E Ingestion Infrastructure

Evaluation of the DM2E model interpretations

• 2nd Evaluation – UBER/ONB

– DM2E v1.0 Fixed Ranges • Schema has to be updated

– DM2E v1.0 Fixed Ranges Short A • Loss of class specific properties

• Excluded for further development

– DM2E v1.0 Fixed Ranges Short B • Schema has to be updated

– DM2E v1.0 Fixed Ranges Short C • Allows inconsistent mappings

• Excluded for further development

Page 10: The DM2E Data Model and the DM2E Ingestion Infrastructure

Evaluation of the DM2E model interpretations

• 3nd Evaluation – UBER/ONB/NTUA

– DM2E v1.0 Fixed Ranges

– DM2E v1.0 Fixed Ranges Short B

We have to choose one of them now!

Page 11: The DM2E Data Model and the DM2E Ingestion Infrastructure

Next Steps of UBER in WP2

• Provide mapping help

• Analyse your mappings – Which resources are not used?

– Are any resources missing?

• Revise the model – Smaller logical or typographical errors in the model can

immediately be corrected

– Other adaptions will be made over a longer development cycle (meaning a period of several months) Former mappings will remain valid!

We need your feedback!

Page 12: The DM2E Data Model and the DM2E Ingestion Infrastructure

DM2E Model: Metalevel

• Levels of Abstraction in DM2E

DM2E Review: Work Package 2 12 16.04.2013

Class Uplink Metadata

edm:ProvidedCHO ore:isAggregatedBy About the content

ore:Aggregation ore:isDescribedBy About the provided metadata, providers perspective, record level

ore:ResourceMap dm2e:DataResource foaf:Document

void:inDataset

void:Dataset (Named Graph)

About the RDF data, DM2E perspective

Metalevel, managed by DM2E

Infrastructure

Core data, created by provider mappings

Page 13: The DM2E Data Model and the DM2E Ingestion Infrastructure

DM2E Architecture

DM2E Review: Work Package 2 13 16.04.2013

WP 1

WP 2

WP 3

Page 14: The DM2E Data Model and the DM2E Ingestion Infrastructure

OmNom Ingestion Platform

DM2E Review: Work Package 2 14 16.04.2013

Page 15: The DM2E Data Model and the DM2E Ingestion Infrastructure

WP2 Infrastructure

DM2E Review: Work Package 2 15 16.04.2013

Page 16: The DM2E Data Model and the DM2E Ingestion Infrastructure

The Result: Linked Data

DM2E All WP Meeting: Work Package 2 16 11.06.2013

Page 17: The DM2E Data Model and the DM2E Ingestion Infrastructure

Workflow: Orchestration of Services

DM2E All WP Meeting: Work Package 2 17 11.06.2013

Page 18: The DM2E Data Model and the DM2E Ingestion Infrastructure

Workflows

• OmNom: Distributed infrastructure to ingest and create data in DM2E.

• Workflow = Dataflow

• Data is created and transformed by web services

• Components:

– Input services (File services, D2R instances, OAI-PMH, ...)

– Transformation services (Generic XSLT, MINT, R2R)

– Ingestion services (Output of an ingestion pipeline)

– Contextualization services (Silk)

– Configuration Services (MINT and Silk act as editors)

DM2E Review: Work Package 2 18 16.04.2013

Page 19: The DM2E Data Model and the DM2E Ingestion Infrastructure

The Linked Data Gap

• Linked Data publication is often one-way.

• Linked Data as an export from the „real“ data.

• This leads to a gap: YOUR data becomes separated from the Linked Data.

DM2E Review: Work Package 2 19 16.04.2013

Page 20: The DM2E Data Model and the DM2E Ingestion Infrastructure

Bridge the gap from YOUR data to Linked Data

DM2E Review: Work Package 2 20 16.04.2013

Image by courtesy of Kiril Havezov, sxc.hu (walker_M)

Page 21: The DM2E Data Model and the DM2E Ingestion Infrastructure

The DM2E Data Bridge

DM2E Review: Work Package 2 21 16.04.2013

This is YOUR data.

This is the void:Dataset

in DM2E.

Page 22: The DM2E Data Model and the DM2E Ingestion Infrastructure

Some more links are actually available...

DM2E All WP Meeting: Work Package 2 22 11.06.2013

Page 23: The DM2E Data Model and the DM2E Ingestion Infrastructure

Personalization and Security

Page 24: The DM2E Data Model and the DM2E Ingestion Infrastructure

Authentication Service

● Centralized Authentication and Authorization Service

● Centralized Storage of User Accounts

● User Account Schema based on MINT Model.

● Single Sign-On

● Standard Based: JAAS, Web Services/SOAP

● "Remember Me" support

● Password reset support

Page 25: The DM2E Data Model and the DM2E Ingestion Infrastructure

Contextualization

• Silk: Silk Link Discovery Framework (UMA)

• Definition of linkage rules to create links between Linked Data resources.

DM2E Review: Work Package 2 25 16.04.2013

Page 26: The DM2E Data Model and the DM2E Ingestion Infrastructure

Next steps

• Intermediate Version (July 2013)

– Complete transformation and ingestion infrastructure

– Integrated contextualization

– Connection with scholarly environment (WP3)

DM2E Review: Work Package 2 26 16.04.2013

Page 27: The DM2E Data Model and the DM2E Ingestion Infrastructure

Thank you.

DM2E Review: Work Package 2 27 16.04.2013