7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler...

24
7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Provenance Framework in Kepler Ilkay Altintas Norbert Podhorszki Contributors: S. Bowers, B. Ludäscher, T. McPhillips (UC Davis) O. Barney (U Utah), E. Jäger-Frank (SDSC)
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler...

Page 1: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

7th Biennial Ptolemy Miniconference

Berkeley, CAFebruary 13, 2007

Provenance Framework in KeplerProvenance Framework in Kepler

Ilkay AltintasNorbert Podhorszki

Contributors:

S. Bowers, B. Ludäscher, T. McPhillips (UC Davis)

O. Barney (U Utah), E. Jäger-Frank (SDSC)

Page 2: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 2Ptolemy Miniconference, February 13, 2007

Outline

Provenance? What is it?

Framework in Kepler to record provenance data

RWS: A provenance model suitable for Kepler's different

computational models.

Possible Applications of Provenance

Page 3: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 3Ptolemy Miniconference, February 13, 2007

What to track and why

Do we need some tracking of what is happening?

Recreate results and rebuild workflows using the evolution

information (see repeatable experiments)

Associate the workflow with the results it produced

Create links between generated data in different runs, and compare

different runs

Recover from a system failure

• Checkpoint a workflow Debug and explain results (via lineage tracing, …)

Smart Reruns

• Avoid re-generating the same data all the time

Page 4: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 4Ptolemy Miniconference, February 13, 2007

Model of Provenance

Core feature

capture the processing history (trace) leading to a data product

Model of Computation (MoC)

Well-defined in terms of input/output relations and the (partial) order of actions

• MMoC ( PProgram, IInput ) OOutput

• DAG, SDF, DDF, PN, etc

Different ways of specification

• see Ptolemy-related papers, Kahn-McQueen paper, etc.

• give abstract/high-level pseudo code

• Practically it is defined through the implementation of the execution system

(including the scheduling). In Kepler/Ptolemy it is the Director.

There are legal (possible) runs under a given MoC

Page 5: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 5Ptolemy Miniconference, February 13, 2007

Model of Provenance

Model of Provenance (MoP)

The starting point is a MoC and its particular implementation

• Observables e.g. a single fired(x, A, y) or reads, writes and actions separately

• Trace: recorded assertions (about observable events) during a legal run

MoP is a MoC, except the “legal run” replaced with “legal trace”

There is a default MoP for a MoC: the total trace of each observable events

• Turing machine: moves of the head, data read and written

A MoP may add another information or omit some (“T=R-I+M”)

• Trace = Run – Ignored things + Modelled additional things

• M: Add real timestamps of actions, execution host information

• I: Omit the input for each action if this can be inferred unambiguously later (DAG)

• Depends on the application of the trace

T

Page 6: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 6Ptolemy Miniconference, February 13, 2007

MoP Examples

DAG workflow

Record: Output data generated by the actions

Inference: Execution of actions and inputs to them can be inferred from the

DAG itself

Smart-rerun

Record: Output of an action and the parameters for that action should be

recorded

Inference: If an action’s parameter is not changed and actions on which this

action depends (inferred from the workflow graph) are also unchanged, the

action’s output will be the same in a future run.

Kitchen definition

A MoP is “good” if it can handle the intended questions & use cases.

Page 7: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 7Ptolemy Miniconference, February 13, 2007

Kepler: Streaming actors

Stateful actors

• An output depends on all inputs in the past. e.g. AddSubstract

Stateless actors

• An output depends only on inputs read in the current firing. E.g.

Expression, RecordAssembler

Non-conformist actors

• Filter, Running average, Daily average (some of the past inputs)

• How do you determine correctly which inputs a given output depends on?

MoP Examples

A

Page 8: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 8Ptolemy Miniconference, February 13, 2007

Kepler: Data dependent routing (branches and loops)

The firing history of the actors cannot be inferred from the static

workflow graph

• Something should be recorded (e.g. firings)

MoP Examples

Page 9: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 9Ptolemy Miniconference, February 13, 2007

RWS

A Model of Provenance for

Kepler Directors

Page 10: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 10Ptolemy Miniconference, February 13, 2007

what about actor state? what about “real” dependencies?

State-reset event s defines when actor “cuts off” dependencies

a semantic notion, known to the actor [developer] (or part of a higher-order scheme)

r, r … r, w, w, … w, s!, r, … r, w, ... w, …

reference: IPAW’06, Bowers et al

RWS: Read − Write − State-reset

s!

A

r … r w … w

PS

???

r, r … r, w, w, … w, r, … r, w, … w …

time

firing

Page 11: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 11Ptolemy Miniconference, February 13, 2007

RWS trace of some actors

Stateless actor (r+ w+ s)* : r … r w… w s r … r w… w s …

Stateful actor (r+ w+)*

Simple filter actor (conditional depends only on current token)

(r w? s)* : either it emits a token or not

Daily average of hourly measurement ((r w)24 s)*

Generally: RWS firing is defined in terms of r and w events

r+ w+ defines one RWS firing (most Kepler actors behave similarly)

More general: definition of the RWS firing round

(r+ w+)* s : dependencies among several firings

Page 12: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 12Ptolemy Miniconference, February 13, 2007

Kepler Provenance Framework

Page 13: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 13Ptolemy Miniconference, February 13, 2007

Provenance Framework in Kepler

Modeled as a separate concern in the system

Optional drag and drop feature

Listen to execution and save information (customizable):

Context: who, what, where, and when that is associated with the run

Input data and its associated metadata

Workflow outputs and intermediate data products

Workflow definition (entities, parameters, connections): a specification of

what exists in the workflow and can have a context of its own

Information about the workflow evolution -- workflow trail

Page 14: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 14Ptolemy Miniconference, February 13, 2007

Kepler System Architecture

Authentication

GUI

Vergil

SMS

KeplerCore

ExtensionsPtolemy

…Kepler GUI Extensions…

Actor&DataSEARCH

TypeSystem

Ext

ProvenanceRecorder

KeplerObject

Manager

Documentation

Smart Re-run /Failure

Recovery

IPAW’06-Altintas et al.

Page 15: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 15Ptolemy Miniconference, February 13, 2007

Kepler Provenance Recorder (IPAW’06, Altintas et al)

• Parametric and customizable

– Different report formats– Variable levels of

verbosity• all, some, medium, on

error

– Multiple cache destinations

• Saves information on– User name, Date, Run,

etc…

Page 16: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 16Ptolemy Miniconference, February 13, 2007

Implementation details

The Provenance Recorder

Extends the Ptolemy AbstractSettableAttribute

Listens to the Director for

• Changes in the workflow graph

• Initialization, workflow execution and stop

• Actor firing

Listens to all IOPorts for

• Token emissions on output ports to record output data

That is, we could say it is a

Ptolemy Provenance Framework

Page 17: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 17Ptolemy Miniconference, February 13, 2007

Implementation details

Builds an internal representation of the workflow graph

Ptolemy’s DirectedGraph

Nodes: IOPorts, Edges: port connections

Used for

• Recording workflow structure (dependencies among ports)

• Subscribing at all ports (listening for input/output)

Page 18: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 18Ptolemy Miniconference, February 13, 2007

Application: smart-rerun

Page 19: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 19Ptolemy Miniconference, February 13, 2007

Implementation of RWS in Kepler

Data model

i.e. observables in all MoC implementations in Kepler Port-actor relationship

• portTable(Port, Actor, type)• type is a for atomic and c for composite actors (transparent)

Token-object relationship• tokenTable(Token, Object)

Object-value relationship• objectTable(Object, Value, Type)

• type is currently not recorded RWS trace

• traceTable(Port, Event, Token, FiringCounter)• event: r as read, w as write or s as state-reset

Page 20: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 20Ptolemy Miniconference, February 13, 2007

Extending the framework

1. Initialization (initialize())

Framework traverses the workflow graph (ports and

connections)

RWS: generate specific data structures (port, actor and

connection details)

2. Just before start (validate())

Framework subscribes for event listeners

RWS: subscribe additional listener TokenGetEvent

Page 21: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 21Ptolemy Miniconference, February 13, 2007

Extending the framework

3. When workflow is modified (changeExecuted())

Framework traverses the workflow graph (ports and

connections)

RWS: re-generate data structures

4. During execution when an event occurs

TokenSendEvent() and TokenGetEvent() listeners are

extended to generate RWS trace events

Page 22: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 22Ptolemy Miniconference, February 13, 2007

Possible applications of Provenance

Smart-rerun

Monitoring/debugging of a workflow

see LiDAR poster today by Efrat Jäger-Frank

Answering processing history, data related question

Participated at the First Provenance Challenge with

Kepler-RWS http://twiki.ipaw.info/bin/view/Challenge/RWS

Reporting/documentation of

workflows and data productsGenerate my

publication

Page 23: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 23Ptolemy Miniconference, February 13, 2007

Acknowledgement

RWS model

Shawn Bowers and Timothy McPhillips, UC Davis

Formalization of the MoPs

Bertram Ludäscher, UC Davis

Kepler Provenance Framework implementation

Oscar Barney, Univ. of Utah, Salt Lake City

Efrat Jäger-Frank, SDSC, San Diego

Page 24: 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Provenance 24Ptolemy Miniconference, February 13, 2007

References

RWS model

S.Bowers, T.McPhillips, B.Ludäscher, S.Cohen and S.B.Davidson

A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows

Intl. Provenance and Annotation Workshop (IPAW), Chicago, 2006

B.Ludäscher, N.Podhorszki, I.Altintas, S.Bowers, T.McPhillips

From Computation Models to Models of Provenance and the RWS Model

to appear in 2007 in Journal of Concurrency and Computation: Practice and Experience

Provenance framework

I.Altintas, O.Barney, E.Jäger-Frank

Provenance Collection Support in the Kepler Scientific Workflow System

Intl. Provenance and Annotation Workshop (IPAW), Chicago, 2006