Scientific workflow management system based on Ptolemy II Allows scientists to visually design and...

33

Transcript of Scientific workflow management system based on Ptolemy II Allows scientists to visually design and...

Page 1: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Page 2: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Page 3: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Scientific workflow management system based on Ptolemy II

Allows scientists to visually design and execute scientific workflows

Actor-oriented model with directors acting as the main workflow engine

Enables different models of computation

Page 4: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Modeling flow of data from one step to another in series of computations to achieve some scientific goal

Page 5: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Software system for modeling, simulation, and design of concurrent, real-time, embedded systems developed at UC Berkeley

Objective:“The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation.”

Page 6: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Page 7: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Directors Actors Ports Relations

PortPort

Actor Actor

LinkRelation

Actor

Port

connection

Link

Link

Attributes Attributes

Attributes

Page 8: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Directors control execution of workflow Actors are executable components of a

workflow (scheduling, dispatching threads, etc)

Directors govern execution of Actors

Page 9: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Actor-/Dataflow Orientation vsObject-/Control flow Orientation

Page 10: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Every Kepler workflow needs a director

Execute networks of components under multiple execution models› Synchronous vs. Parallel vs. Dataflow

vs. time-based vs. event-based vs. all combined

Computation model dictates semantics for component interaction

Page 11: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Make use of separation of concerns› e.g., component execution, workflow

execution and provenance tracking Managers acts like “common execution

environment” › governing different concerns related to

execution of network and services

Page 12: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

CT – continuous time modeling DE – discrete event systems FSM – finite state machines PN – process networks SDF – synchronous dataflow DDF – dynamic dataflow SR - synchronous/reactive systems

Page 13: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Reusable components that execute variety of functions

Communicate with other actors in workflow through ports

Composite actor – aggregation of actors

Composite actor may have a local director

Page 14: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Top level workflows can be conceptual representation of science process

Drilling down reveals increasing levels of detail

Composing models using hierarchy promotes development of re-usable components

Page 15: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Each actor implements several methods› initialize() – initializes state variables› prefire() – indicates if actor wants to fire› fire() – main point of execution

Read inputs, produce outputs, read parameter values

› postfire() – update persistent state, see if execution complete

› wrapup() Each director calls these methods

according to its model

Page 16: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Copy actor– copy files from one resource to another during execution› Stage actor – local to remote host› Fetch actor - remote to local host

Job execution actor – submit and run a remote job Monitoring actor – notify user of failures Service discovery actor – import web services from a

service repository or web site Rexpression actors MatlabExpression actors Web services actors – Given WSDL and name of an

operation of a web service, dynamically customizes itself to implement and execute that method

Database connection and query actors

Page 17: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Ports used to produce and consume data and communicate with other actors in workflow› Input port – data consumed by actor› Output port – data produced by actor› Input/output port – data both produced and

consumed

Page 18: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Direct same input or output to more than one port

Example: direct output to 1. display actor to show intermediate

results, and 2. operational actor for further processing

Page 19: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Execution Options: › inside GUI› at command-line› distributed computing

Page 20: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Page 21: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Page 22: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Page 23: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Page 24: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Page 25: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Kepler components can be shared by exporting workflow or component into a Kepler Archive (KAR) file (extension of JAR file format)

Component Repository is centralized system for sharing Kepler workflows

Users can search for components from repository from within Vergil

Page 26: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Kepler provides direct access to scientific data archived in many of commonly used data archives. › Ex. access to data stored in Knowledge

Network for Biocomplexity (KNB) Metacat server and described using Ecological Metadata Language.

Additional supported data sources › DiGIR protocol, OPeNDAP protocol, GridFTP,

JDBC, SRB, and others.

Page 27: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Kepler ships by default with:› Globus actors› GridFTP actors

No BES implementation*

Job submission to openPBS, G-lite Kepler actors capable of using Unicore by

Euforia (Poznań SC) TeraGrid gateways exists that use Kepler

Page 28: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Page 29: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Page 30: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Actor Data Polymorphism:› Add numbers (int, float, double, complex)› Add strings (concatenation)› Add complex types (arrays, records,

matrices)› Add user-defined types

Page 31: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Page 32: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Distributed execution of workflow parts (peer to peer) Efficient data transfer Provenance tracking of data and processes Tracking workflow evolution Streaming data analysis Easy-to-deploy batch interfaces Intuitive workflow design Customizable semantic typing Interoperability with other workflow and analytical

environments (at exec level)

Page 33: Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.

Ecology› SEEK: Ecological Niche Modeling and climate change› REAP: Modeling parasite invasions in grasslands using sensor networks› NEON: Ecological sensor networks; COMET: Environmental science

Geosciences› GEON: LiDAR data processing, Geological data integration› NEESit: Earthquake engineering

Molecular biology› SDM: Gene promoter identification and ScalaBLAST› ChIP-chip: Genome-scale research; CAMERA: Metagenomics

Oceanography› REAP: SST data processing; LOOKING/OOI CI: ocean observing CI› ROADNet: real-time data modeling and analysis› ATOL: Processing Phylodata ; CiPRES: Phylogentic tools

Chemistry› Resurgence: Computational chemistry; DART/ARCHER: X-Ray crystallography

Library science› DIGARCH: Digital preservation; UK Text Mining Center: Cheshire feature and

archival Conservation biology

› SanParks: Thresholds of Potential Concerns Physics

› SDM: astrophysics TSI-1 and TSI-2 ; CPES: Plasma fusion simulation; ITER-EU: ITM fusion workflows