Reproducibile scientific workflows - Acting on Change 2016

Post on 15-Feb-2017

42 views 0 download

Transcript of Reproducibile scientific workflows - Acting on Change 2016

Reproducible scientific workflows

Tomasz MiksaVienna University of Technology

& SBA Research, Austria

Tomasz Miksa tmiksa@sba-research.org

eScience and Research Infrastructures

Scientists exchange- facilities- resources- services- datasets

Research requires- special tooling and software- workflows to

• capture• transform• visualize• interpret the data

Tomasz Miksa tmiksa@sba-research.org

Taverna Workflow

Workflows and Context

‘Workflows’ can be- ad hoc commands and scripts

executed manually - well-structured processes

executed within a controlled environment

Workflows - share infrastructure with other processes- delegate tasks to tools installed in the system- require specific configurations- can use distributed systems

#!/bin/bash

# fetch datajava -jar GestBarragensWSClientIQData.jarunzip -o IQData.zip

# fix encoding#iconv -f LATIN1 -t UTF-8 iq.r > iq_utf8.r

# generate referencesR --vanilla < iq_utf8.r > IQout.txt

# create pdfpdflatex iq.texpdflatex iq.tex

Script

Tomasz Miksa tmiksa@sba-research.org

Reproducibility

Current studies show very low reproducibility in

- medicine

- economy

- computer science

Reproducibility requires

- well documented research workflows

- precise information

on the experiment's environment

Tomasz Miksa tmiksa@sba-research.org

Reproducibility Neuroanatomical studies

FreeSurfer Software- cortical thickness and volume of neuroanatomical structures

Different - FreeSurfer Versions

• v4.3.1, v4.5.0, v5.0.0

- Workstation • Mac, Hewlett‐Packard

- Operating system version• OSX 10.5, OSX 10.6

E. Gronenschild, P. Habets, H. I. L. Jacobs, R. Mengelers, N. Rozendaal, J. van Os, and M. Marcelis, “The effects of freesurfer version, workstation type, and macintosh operating system version on anatomical volume and cortical thickness measurements,” 2012.

Tomasz Miksa tmiksa@sba-research.org

Reproducibility Computer Science

613 papers in 8 ACM conferences

C. Collberg and T. Proebsting, “Measuring reproducibility in computer systems research,” 2014. [Online]. Available: http://reproducibility.cs.arizona.edu/tr.pdf

Tomasz Miksa tmiksa@sba-research.org

ReproducibilityComputer Science

E-mail responses from authors- Wrong version- Code will be available soon- Programmer left- Bad backup practices- Commercial code- Proprietary academic code- Intellectual property- No intention to release- …

Variety of solutions

Workflow systems Interactive notebooks Virtualisation Containers Code repositories Automated builds

Service monitoring Metadata standards Provenance Preservation planning Repositories

Tomasz Miksa tmiksa@sba-research.org

TIMBUS - Process preservation

Digital preservation of business processes Based on risk management Context modelling is the key

Tomasz Miksa tmiksa@sba-research.org

TIMBUS - Context modelling

Context model Automated extractors Process execution monitoring Service monitoring

TIMBUS - Risk mitigation strategies

Metadata and documentation

Migration- File formats- Storage media- Alternative services

• Open source service• In‐housing of services

Emulation Virtualisation Mock‐up of systems

Tomasz Miksa tmiksa@sba-research.org

Summary

Scientific experiments- workflows for data processing with software dependencies

Risks affecting reproducibility - low due to insufficient experiment description

Solutions for improving reproducibility- improve data management, sharing and reuse

TIMBUS approach for process preservation- based on risk management practices- using context modelling to evaluate preservation alternatives