BSB Demo Day - Schlarb - Workflow-Design

29
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Entscheidungsfindung in der Digitalisierung durch experimentelle Workflow-Entwicklung Sven Schlarb, Austrian National Library IMPACT Demo Day München, 11. Oktober 2011

description

 

Transcript of BSB Demo Day - Schlarb - Workflow-Design

Page 1: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

Entscheidungsfindung in der Digitalisierung durch experimentelle Workflow-Entwicklung

Sven Schlarb, Austrian National LibraryIMPACT Demo Day

München, 11. Oktober 2011

Page 2: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

2

OCR: Herausforderungen …I. Bildvorverarbeitung und OCR

Page 3: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

3

OCR: Herausforderungen …II. Linguistische Nachbearbeitung (Gemischte Sprachen, Historische

Varianten, etc.)

Beispiel: Historische Varianten des Niederländischen Worts ‘wereld’(Welt):

werelt weerelt wereld weerelds wereldt werelden weereld werrelts waerelds weerlyt wereldts vveerelts waereld weerelden waerelden weerlt werlt werelds sweerels zwerlys swarels swerelts werelts swerrels weirelts tsweerelds werret vverelt werlts werrelt worreld werlden wareld weirelt weireld waerelt werreld werld vvereld weerelts werlde tswerels werreldts weereldt wereldje waereldje weurlt wald weëled

Page 4: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

4

… und eine Vielfalt an Lösungen� 22 verschiedene ‘Werkzeuge’ von verschiedenen Entwicklern und

aus unterschiedlichen Work Packages � Unterschiedliche technische Umgebungen:

– OCR (C++, C#),

– Bildverarbeitung & Lexika (C, C++, DLL),

– Kommandozeilenprogramme (Windows/Linux),

– Java, Ruby, PHP, Perl, etc.

� IMPACT Interoperability Framework (IIF)

Page 5: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

5

Technische Herausforderungen� Skalierbarkeit

– Umfang der Eingabedaten (Einzelne Seiten / tausendeBücher/Zeitungen)

– Größe der Eingabedaten (z.B. sehr hochauflösende Bilder)� Stabilität

– Parallelisierung – Geklonte Knoten → Gleiches Verhalten?– Failover – Alternative Knoten bei Fehlern– Korrekte Funktionsweise der Einzelkomponenten

� Transparenz– Verständliche Fehlermeldungen während der Stapelverarbeitung

auf den verschiedenen Architekturebenen (Werkzeug-, Service-, Workflowebene)

Page 6: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

6

Experimentelle Workflow-Entwicklung

� Beispieldaten online verfügbar → Reproduzierbarkeit

� Workflows unmittelbar ausführbar → Vergleichbarkeit

� Workflow-Entwicklung als eine gemeinsame, institutionsübergreifende

Aktivität → Annotation, Bewertung

� „Auf-einen-Blick“-Darstellung des Workflows

� Auffindbarkeit von Komponenten und Workflows, und Workflow-

Fragmenten

� Zentraler Ergebnisdatenspeicher

Page 7: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

7

Interoperability Framework� Interoperabilität vs. Integration� Web-basiert vs. lokale Applikation/Plattform

� Java 6� Apache Tomcat� Apache Axis2� Apache Synapse (optional)� Taverna Workflow Engine

Page 8: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

8

Tool Wrapper

Anforderung: Werkzeug als Kommandozeilenprogramm verfügbar

Tool wrapper code im Github Repository der Open Planets Foundation (OPF) verfügbar:

https://github.com/openplanets/scape/tree/master/xa-toolwrapper

� Minimaler Integrationsaufwand für Werkzeug-Entwickler

Page 9: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

9

Service Oriented Architecture� Java als

Programmiersprache

� Standard Apache Komponenten

� Synapse als Enterprise Service Bus (load balancing & fail over)

� HTTPS Verschlüsselung& Basic Auth

� Minimaler Aufwand für das Komponenten-Deployment

Page 10: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

10

Verknüpfung von Einzelkomponenten zu einem

„Workflow“

Page 11: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

11

Workflow-Entwicklung

� OCR workflow = Datenverarbeitungspipeline

� Komponenten =

Verarbeitungsschritte(knoten)

Page 12: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

Workflow-Komponenten

� “Basic” workflow = Minimal-Komponente für ein IMPACT-Werkzeug

� Gut dokumentiert, Beispieldaten vorhanden, ausführbar

Page 13: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

13

Workflow Management� Komponenten-Verzeichnis: myExperiment

� Localer Client: Taverna Workbench

� Web Client: Projekt Website

Page 14: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

Workflow-Verzeichnis

� Komponenten und Workflowsveröffentlichen

� Bewerten, Taggen, Kommentieren, ...

� Verweise auf verwendete Komponenten und Workflows anderer Nutzer

Page 15: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

Komponenten-Katalog?

Tool

Bitonal imageInput and output

binary image, but incompatible

Bitonal image

GetImageFromURL

URL String

Bitonal imageRGB Image

RGB Image

How to find the corresponding tool?

How to proceed in case of a Gap?

Viele Fehler unterlaufen, weil Anforderungen an Eingabe- und Ausgabedaten nicht ausreichend spezifiziert (formalisiert!) sind.

Page 16: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

Local client: Taverna Workbench

http://www.taverna.org.uk/

� Hintergrund: Bioinformatik

� Entwickelt vonmyGrid, Manchester

� Verfügbar für Windows/Linux/OSXals Open Source

Page 17: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

17

Workflowentwicklung in Taverna� Workflows lassen sich

einfach aus verfügbaren Komponenten und Workflows erstellen (drag and drop)

� Hinweis: Komplexität limitiert →Zusammengehörende Arbeitsschritte in Komponente zusammenfassen

Page 18: BSB Demo Day - Schlarb - Workflow-Design
Page 19: BSB Demo Day - Schlarb - Workflow-Design
Page 20: BSB Demo Day - Schlarb - Workflow-Design
Page 21: BSB Demo Day - Schlarb - Workflow-Design
Page 22: BSB Demo Day - Schlarb - Workflow-Design
Page 23: BSB Demo Day - Schlarb - Workflow-Design
Page 24: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

Web client: Taverna Server/Workflow Parser

� SOAP/REST API

� Entfernte Workflowausführung durch Übergabe der XML-Instanz

Page 25: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

25

Use case: Workflows für die Evaluation

� Werkzeug A vs. Werkzeug B (Werkzeug A(v1) vs Werkzeug A(v2))

� Workflow X (Werkzeug A + B) vs Workflow Y (Werkzeug A + C)

� Optimaler Workflow mit Bezug auf das Quellmaterial ermitteln

Page 26: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

26

Zentraler Ergebnisdatenspeicher

Schnittstelle zur Speicherung von Ergebnisdaten (WebDAV) und zurBerichterstellung (Apache POI) als Workflow-Modul realisiert

Page 27: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

27

Workflows in laufenden Projekten

� Workflows in der Digitalisierung � IMPACT

� Workflows in der Linguistischen Analyse � CLARIN

� Workflows in der Langzeitarchivierung � SCAPE

� Und viele mehr ...

Page 28: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

28

Kompatibilität der Workflow-Frameworks

� Beispiel: UIMA ↔ Taverna� Eigennamenextraktion → Linguistische Analyse → Semantic Web� Digitalisierung, OCR → Langzeitarchivierung

Page 29: BSB Demo Day - Schlarb - Workflow-Design

IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.

Danke! Fragen?