Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows
Preservation Workflows with Taverna
-
Upload
cneudecker -
Category
Technology
-
view
100 -
download
4
Transcript of Preservation Workflows with Taverna
![Page 1: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/1.jpg)
SCAPE
Preservation Workflows with TavernaClemens Neudecker, Afdeling
Onderzoek, Koninklijke Bibliotheek
I&O Kennissessie
28 november
2012
SCAPESCAlable
Preservation
Environments
![Page 2: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/2.jpg)
SCAPE
![Page 3: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/3.jpg)
SCAPEBackground
•
What is a scientific workflow?•
““The automation of a business process, in whole or part,
during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules.”
•
Background: eSciences, in particular Life Sciences
•
Two approaches•
Data driven (what)
•
Control driven (how)
3
![Page 4: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/4.jpg)
SCAPEBackground II
•
Why use scientific workflows?•
Automation of repetitive processes
•
Chaining of distinct components (interoperability)•
“In‐silico
experimentation”
•
Documented experiment configuration•
Re‐usable by others (encapsulation)
![Page 5: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/5.jpg)
SCAPEBackground III
•
Scientific Workflow Management Systems•
Taverna (myGrid, UK)
•
Kepler
(Kepler, USA)•
Meandre
(SEASR, USA)
•
and there are many more…
•
Why Taverna?•
Good experience in IMPACT, Open source
•
European partner (University of Manchester, UK)•
Widely used (> 4000 active users)
•
Shields complexity from end‐user
![Page 6: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/6.jpg)
SCAPEExcourse: IMPACT
•
EU FP7 project on OCR, coordinated by the KB•
Prototyping use of scientific workflows in digitization
•
Some components being further developed in SCAPE
•
See also http://impact.kbresearch.nl/
![Page 7: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/7.jpg)
SCAPE…but back to digital preservation
•
Example use cases for scientific workflows•
File format identification/migration/validation
•
Tool evaluation•
Quality assurance
•
…and many more!
![Page 8: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/8.jpg)
SCAPEEnter Taverna
•
Web services (SOAP, REST)•
Beanshells
(Java scripting, libraries)
•
R (statistics)•
Local tools (SH/SSH)
•
Excel/CSV•
Plugins
![Page 13: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/13.jpg)
SCAPEExamples
Validate JPEG2000 with
Jpylyzer, convert invalid JP2’s
based on TIFF masters and
validate derived JP2’s again using Jpylyzer
![Page 14: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/14.jpg)
SCAPEExamples
Apply Matchbox Book Page
Images Duplicate
Detection to a list of books
from Google Books Project
![Page 15: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/15.jpg)
SCAPEExamples
Takes a list of ARC files as input
and creates a mime type report
per ARC and a summary report
over all ARCs using TIKA
![Page 16: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/16.jpg)
SCAPEExamples
Validating WAV File Format
using JHOVE2 Web Service
![Page 17: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/17.jpg)
SCAPEScalability
•
Taverna workflows on Hadoop•
Hadoop
= Map/Reduce implementation from Yahoo
•
Idea: Execute workflows on a Hadoop
cluster•
Mainly responsible: AIT, UMAN
•
Clusters: IMF, ONB, KB, SB•
Some problems:•
Scheduling: Hadoop
(1 big jar) or Taverna (many small jars)?
•
Error handling (long running automated workflows)•
List handling (cross product vs. dot product)
•
“Small files problem”
Hadoop
sequenceFile•
OPF Blog:
http://www.openplanetsfoundation.org/blogs/2012‐08‐07‐big‐data‐ processing‐chaining‐hadoop‐jobs‐using‐taverna
![Page 18: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/18.jpg)
SCAPEExamples
Workflow for preparing large
document collections for data
analysis.
Different types of hadoop
jobs
(Hadoop‐Streaming‐ API, Hadoop
Map/Reduce, and Hive) are used
(ONB)
Processing time 60.000 books / 24 Mio. pages: 6 h
![Page 19: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/19.jpg)
SCAPE
Demo(s)
![Page 20: Preservation Workflows with Taverna](https://reader033.fdocuments.net/reader033/viewer/2022052322/557dcb6dd8b42ae4688b49a4/html5/thumbnails/20.jpg)
SCAPEWant some more?
•
SCAPE source code on githubgithub.com/openplanets/scape
•
SCAPE for DevelopersSCAPE Developer's Guide
•
SCAPE PlatformSCAPE Preservation Execution Platform
•
SCAPE workshops, hackathons: check with us!http://www.scape‐project.eu/events