Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

64
NeSCR Dec-3 -2003 Bertram Ludaescher Scientific Workflows Based on Scientific Workflows Based on Dataflow Process Networks Dataflow Process Networks (or (or from Ptolemy to Kepler from Ptolemy to Kepler ) ) (or (or Workflow Considered Harmful …) Workflow Considered Harmful …) Bertram Lud Bertram Lud ä ä scher scher San Diego Supercomputer San Diego Supercomputer Center Center [email protected] [email protected]

description

Scientific Workflows Based on Dataflow Process Networks (or from Ptolemy to Kepler ) (or Workflow Considered Harmful …). Bertram Lud ä scher San Diego Supercomputer Center [email protected]. Overview. Scientific Workflow (SWF) Examples SWF Requirements & Characteristics - PowerPoint PPT Presentation

Transcript of Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

Page 1: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Scientific Workflows Based on Scientific Workflows Based on Dataflow Process Networks Dataflow Process Networks

(or (or from Ptolemy to Keplerfrom Ptolemy to Kepler))(or (or Workflow Considered Harmful …)Workflow Considered Harmful …)

Bertram LudBertram Ludääscherscher

San Diego Supercomputer San Diego Supercomputer CenterCenter

[email protected]@SDSC.edu

Page 2: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

OverviewOverview

1.1. Scientific Workflow (SWF) ExamplesScientific Workflow (SWF) Examples

2.2. SWF Requirements & CharacteristicsSWF Requirements & Characteristics

3.3. Workflow standardsWorkflow standards considered harmfulconsidered harmful for SWF!? for SWF!?

4.4. Dataflow Process Networks (Ptolemy II)Dataflow Process Networks (Ptolemy II)

5.5. Scientific Workflows (Kepler = Ptolemy II + Scientific Workflows (Kepler = Ptolemy II + XX))

Page 3: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Acknowledgements IAcknowledgements I• NSF, NIH, DOENSF, NIH, DOE

• GEOsciences Network (NSF) GEOsciences Network (NSF) – www.geongrid.org

• Biomedical Informatics Research Network (NIH)Biomedical Informatics Research Network (NIH)– www.nbirn.net

• Science Environment for Ecological Knowledge (NSF)Science Environment for Ecological Knowledge (NSF)– seek.ecoinformatics.org

• Scientific Data Management Center (DOE)Scientific Data Management Center (DOE)– sdm.lbl.gov/sdmcenter/

Page 4: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Acknowledgements IIAcknowledgements II• Ilkay Altintas SDM Ilkay Altintas SDM • Chad Berkley SEEK Chad Berkley SEEK • Shawn Bowers SEEKShawn Bowers SEEK• Jeffrey Grethe BIRNJeffrey Grethe BIRN• Christopher H. Brooks Ptolemy II Christopher H. Brooks Ptolemy II • Zhengang Cheng SDM Zhengang Cheng SDM • Efrat Jaeger GEON Efrat Jaeger GEON • Matt Jones SEEK Matt Jones SEEK • Edward A. Lee Ptolemy II Edward A. Lee Ptolemy II • Kai Lin GEONKai Lin GEON• Bertram Ludaescher BIRN, GEON, SDM, SEEKBertram Ludaescher BIRN, GEON, SDM, SEEK• Stephen Neuendorffer Ptolemy II Stephen Neuendorffer Ptolemy II • Mladen Vouk SDM Mladen Vouk SDM • Yang Zhao Ptolemy II Yang Zhao Ptolemy II • ……

• Coming soon!?:Coming soon!?: – ROADNet, myGrid, GriPhyN, ...

Ptolemy IIPtolemy II

Page 5: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Promoter Identification Workflow (PIW)Promoter Identification Workflow (PIW)

Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)

Page 6: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Promoter Identification

Workflowin Ptolemy-II(SSDBM’03)

ExecutionSemantics

Page 7: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

GARP Invasive Species GARP Invasive Species PipelinePipeline

Training sample

(d)

GARPrule set

(e)

Test sample (d)

Integrated layers

(native range) (c)

Speciespresence &

absence points(native range)

(a)EcoGridQuery

EcoGridQuery

LayerIntegration

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Validation

MapGeneration

Integrated layers (invasion area) (c)

Species presence &absence points

(invasion area) (a)

Native range

predictionmap (f)

Model qualityparameter (g)

Environmental layers (native

range) (b)

GenerateMetadata

ArchiveTo Ecogrid

RegisteredEcogrid

Database

RegisteredEcogrid

Database

RegisteredEcogrid

Database

RegisteredEcogrid

Database

Environmental layers (invasion

area) (b)

Invasionarea prediction

map (f)

Model qualityparameter (g)

Selectedpredictionmaps (h)

Source: NSF SEEK (Deana Pennington et. al, UNM)Source: NSF SEEK (Deana Pennington et. al, UNM)

Page 8: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Rock & Mineral Classification Rock & Mineral Classification WorkflowWorkflow

Page 9: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

A Look Inside ClassificationA Look Inside Classification

Diagrams information and transitions between them.

Extracted from the mineral composition and this level’s diagram coordinates.

SVG to polygons.

Classifier: Locates the point’s region.

Finer granularity

Displays the point in the diagram for this level.

Page 10: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Source: NIH BIRN (Jeffrey Grethe, UCSD)Source: NIH BIRN (Jeffrey Grethe, UCSD)

Page 11: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

SWF Requirements & CharacteristicsSWF Requirements & Characteristics

• Scientist friendly "problem solving environment"Scientist friendly "problem solving environment"– WF design

– WF execution

– WF steering and UI• pause; revise; resume; rollback (cf. SCIRun)

– repositories of reusable components

– data and WF provenance (virtual data concept)• logging, cache reuse/partial re-derive, reports, …

– Conceptual modeling support• complex data (semantics) support

• “wiring” support (cf. web service composition)

• planning support

Page 12: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

SWF Requirements & CharacteristicsSWF Requirements & Characteristics

• "Modeling" support"Modeling" support– Abstraction, hierarchical modeling– Models of Computation (MoC)– component interaction; combination of MoCs (cf. CCA)– WF multi-grain/granola: powder to bolders (and back)

• Boolean (N)AND, (N)OR,… vs. chaining together Grid-apps

– Rich data structures and type systems

• End user "programming" supportEnd user "programming" support– high-level programming constructs

• e.g. map/3 for iteration, filter, select, branch, merge, ...

– data transformations– legacy tool integration (plug-ins)– data streaming

• How to tame (e.g., starve a dataflow; then resume)? Zauberlehrling’s problem

Page 13: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

SWF Requirements & CharacteristicsSWF Requirements & Characteristics

• Grid-enabling SWFsGrid-enabling SWFs– transparent use of (remote) resources

– big data

– big computation requirements

– early/late binding of logical to physical resources, …

– planning, scheduling, …

cf. Chimera, Pegasus, DAGman, Condor(-G)

Page 14: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Scientific Workflows: Some FindingsScientific Workflows: Some Findings

• More More dataflowdataflow than (business) workflow than (business) workflow– but some branching looping, merging, …– not: documents/objects undergoing modifications – instead often: dataset-out = analysis(dataset-in)

• Need for “Need for “programming extensionprogramming extension” ” – Iterations over lists (foreach); filtering; functional composition; generic &

higher-order operations (zip, map(f), …)

• Need for Need for abstractionabstraction and and nested workflowsnested workflows• Need for Need for data transformationsdata transformations (compute/transform alternations) (compute/transform alternations)• Need for rich Need for rich user interactionuser interaction & & workflow steeringworkflow steering::

– pause / revise / resume– select & branch; e.g., web browser capability at specific steps as part of a

coordinated SWF

• Need for Need for high-throughputhigh-throughput transfers (“grid-enabling”, “streaming”) transfers (“grid-enabling”, “streaming”)• Need for Need for persistencepersistence of intermediate products of intermediate products

data provenance (“virtual data” concept)

Page 15: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

A ZOO of Workflow Standards and SystemsA ZOO of Workflow Standards and Systems

Source: W.M.P. van der Aalst et al.http://tmitwww.tm.tue.nl/research/patterns/Source: W.M.P. van der Aalst et al.http://tmitwww.tm.tue.nl/research/patterns/

Page 16: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Business WorkflowsBusiness Workflows

• Business Workflows Business Workflows – show their office automation ancestry

– documents and “work-tasks” are passed

– no data streaming, no data-intensive pipelines– lots of standards to choose from: WfMC, WSFL, BMPL, BPEL4WS,.. XPDL,…

– but often no clear execution semantics for constructs as simple as this:

Source: Expressiveness and Suitability of Languages for Control Flow Modelling in Workflows, PhD thesis, Bartosz Kiepuszewski, 2002

Page 17: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

On Workflow Standards…On Workflow Standards…

http://tmitwww.tm.tue.nl/staff/wvdaalst/Publications/publications.html

Page 18: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Workflow “Standards” DebunkedWorkflow “Standards” Debunked

Source: Don’t go with the flow:Web services composition standards exposed,W.M.P. van der Aalst, Trends & Controversies, Jan/Feb 2003 issue of IEEE Intelligent Systems Web Services - Been there done that?

Page 19: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Workflow “Standards” DebunkedWorkflow “Standards” Debunked

Source: Don’t go with the flow:Web services composition standards exposed,W.M.P. van der Aalst, Trends & Controversies, Jan/Feb 2003 issue of IEEE Intelligent Systems Web Services - Been there done that?

Page 20: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

But never mind the standards discussion:But never mind the standards discussion:

Many Scientific Workflows are Many Scientific Workflows are DataflowsDataflows!!

(Check (Check YOURYOUR examples …) examples …)

Page 21: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Commercial Workflow/Dataflow SystemsCommercial Workflow/Dataflow Systems

Page 22: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

SCIRun: Component-Based Problem Solving SCIRun: Component-Based Problem Solving Environments for Large-Scale Scientific Environments for Large-Scale Scientific

ComputingComputing

• SCIRun: problem solving environment for interactive construction, SCIRun: problem solving environment for interactive construction, debugging, and steering of large-scale scientific computationsdebugging, and steering of large-scale scientific computations

• Component model, based on generalized dataflow programmingComponent model, based on generalized dataflow programming

• Contact: Steve Parker (cs.utah.edu); SciDAC/SDM collaborationContact: Steve Parker (cs.utah.edu); SciDAC/SDM collaboration

Page 23: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Workflow and distributed computation grid created with Kensington Discovery Edition from InforSense.

Page 24: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Dataflow Process Networks:Dataflow Process Networks:Putting Computation Models first!Putting Computation Models first!

• Synchronous Dataflow Network (SDF)Synchronous Dataflow Network (SDF)– Statically schedulable single-threaded dataflow

• Can execute multi-threaded, but the firing-sequence is known in advance

– Maximally well-behaved, but also limited expressiveness

• Process Network (PN)Process Network (PN)– Multi-threaded dynamically scheduled dataflow– More expressive than SDF (dynamic token rate prevents static scheduling)– Natural streaming model

• Other Execution Models (“Domains”)Other Execution Models (“Domains”)– Implemented through different “Directors”

actor actor

typed i/o ports

FIFO

advanced push/pull

Page 25: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Dataflow Process Dataflow Process Networks and Ptolemy-Networks and Ptolemy-

IIII

see!see!see!see!

try!try!try!try!

read!read!read!read!

Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/

Page 26: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Why Ptolemy-II?Why Ptolemy-II?

• PTII Objective:PTII Objective:– “The focus is on assembly of concurrent components. The

key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation.”

• Data & Process oriented:Data & Process oriented:– Dataflow process networks

• Natural Data Streaming SupportNatural Data Streaming Support• End user “WF console” (Vergil GUI)End user “WF console” (Vergil GUI)• PRAGMATICSPRAGMATICS

– mature, actively maintained, well-documented– open source system– leverage “sister projects” activities (e.g. SEEK, SDM, BIRN,…)

Page 27: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/

Page 28: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/

Page 29: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Marrying & Divorcing Control- & DataflowMarrying & Divorcing Control- & Dataflow

Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/

Page 30: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Another Goodie: Ptolemy-II Type SystemAnother Goodie: Ptolemy-II Type System

Page 31: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Support for Multiple Workflow Support for Multiple Workflow GranularitiesGranularities

Bolders

Abstraction:Sand to Rocks

Sand

Powder

Plumbing

Page 32: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Scientific Workflows = Dataflow Process Networks Scientific Workflows = Dataflow Process Networks + X+ X

• X =X = … …– Database plug-ins– Legacy application plug-ins (via command line, as web services, …)– Grid extensions:

• Actors as web/grid services • 3rd party data transfer, high-throughput data streaming• Dealing with thousands of files (cf. astrophysics, astronomy, HEP, … examples)• Data and service repositories, discovery Extended type system (structural & semantic extensions)

– Programming extensions (declarative/FP) and – Rich user interactions/workflow steering– Rich data transformations (compute/transform alternations)– Data provenance

• (semi-)automatic meta-data creation

Kepler = Ptolemy-II + X Kepler = Ptolemy-II + X

Page 33: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Status update / specific tasks for KeplerStatus update / specific tasks for Kepler$DONE, %ONGOING, *NEW$DONE, %ONGOING, *NEW

• User interaction, workflow steering User interaction, workflow steering – $ Pause/revise/resume– $ BrowserUI actor (browser as a 0-learning display and selection tool)

• Distributed executionDistributed execution– $ Dynamically port-specializing WSDL actor – * Dynamically specializing Grid service actor

• Port & actor type extensions (SEEK leverage)Port & actor type extensions (SEEK leverage)– * Structural types (XML Schema)– * Semantic types (OWL) incl. unit types w/ automatic conversion

• Programming extensionsProgramming extensions– % Data transformation actors (XSLT, XQuery, Python, Perl,…)– * map, zip, zipWith, …, loop, switch “patterns”

• Specialized Data SourcesSpecialized Data Sources– $ EML (SEEK), – % MS Access (GEON), *JDBC, – *XML, *NetCDF, …

Page 34: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Some specific tasks for Kepler Some specific tasks for Kepler (all NEW)(all NEW)

• Design & develop transparent, Grid-enabled PNs:Design & develop transparent, Grid-enabled PNs:– Communication protocol details– Grid-actor extensions and/or– Grid-Process Network director (G-PN)– Host/Source-location becomes actor parameter

• add “active-inline” parameter display for grid-actors (@exec-loc), channels (@transport-protocol), source-actors (@{src-loc|catalog-loc})

• Activity MonitoringActivity Monitoring– Add “activity status” display (green, yellow, red) to replace PtII animation

(needed for concurrently executing PN!)

• Registration & Deployment mechanisms Registration & Deployment mechanisms – Actor/Data/Workflow repository (=composite actors)– Shows up as (config’able) actor library– OGSA Service Registry approach? (SEEK leverage; UDDI complex & limited says MattJ)

• http://www-unix.globus.org/toolkit/draft-ggf-ogsi-gridservice-33_2003-06-27.pdf

• Extensions to deal with failures (fault tolerance)Extensions to deal with failures (fault tolerance)

Page 35: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Example: Database actors for Example: Database actors for Ptolemy II Ptolemy II

(Kepler-GEON; Efrat Jaeger)(Kepler-GEON; Efrat Jaeger)

Page 36: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

Database Actors Database Actors

• Database Connection actor: Database Connection actor:

• Database Query actor:Database Query actor:

Page 37: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

Database Actors ExampleDatabase Actors Example

Page 38: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Example: Web service-enabling Example: Web service-enabling Ptolemy II Ptolemy II

(Kepler-SDM; Ilkay Altintas)(Kepler-SDM; Ilkay Altintas)

Page 39: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

A A GenericGeneric Web Service Actor Web Service Actor

Configure – select WSDL url from repositoryConfigure - select service

operation

Page 40: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Set Parameters and Commit Set Parameters and Commit Specialized Actor Specialized Actor

Set parameters and commit

Page 41: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Web Service Actor Web Service Actor afterafter Instantiation Instantiation

Page 42: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Composing Third-Party Web ServicesComposing Third-Party Web Services

Output of previousweb service

User interaction &Transformations

Input of next web service

Page 43: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Results of the ExecutionResults of the Execution

User I/O via standard brower!

Run Window /WF Deployment

Page 44: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Composing Legacy Applications (here: Composing Legacy Applications (here: Phylogeny): Phylogeny):

Shell / Command-Line ActorsShell / Command-Line Actors

Page 45: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Example: Grid-enabling Ptolemy IIExample: Grid-enabling Ptolemy II

( Kepler-SEEK, Chad Berkley( Kepler-SEEK, Chad Berkley

Kepler-SDM, Ilkay Altintas,Kepler-SDM, Ilkay Altintas,

… … myGrid?, …myGrid?, …

……GriPhyN?, …GriPhyN?, …

… … OGS{I|A}-[DAI] ...)OGS{I|A}-[DAI] ...)

Page 46: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

TransparentlyTransparently Grid-Enabling PTII: Grid-Enabling PTII: HandlesHandles

A B

GA GB

1. AGA: get_handle2. GAA: return &X3. AB: send &X4. BGB: request &X5. GBGA: request &X6. GA GB: send *X7. GBB: send done(&X)

Example: &X = “GA.17”

*X =<some_huge_file>

1 2

3

4

5

6

7

PTII space

Grid space

Logical token transfer (3) requires get_handle(1,2); then exec_handle(4,5,6,7) for completion.

Page 47: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

TransparentlyTransparently Grid-Enabling PTII Grid-Enabling PTII

• Different phasesDifferent phases– Register designed WF (could include external validation service)– Find suitable grid service hosts for actors– Pre-stage execution– Execute (w/ provenance)

• Interactively steer (pause; revise; resume)• Batch process; re-run parts later

– Register/store data products and execution logs

• Kepler implementation choices: Kepler implementation choices: – Grid-actors (no change of Director necessary!?) and/or – Grid-(PN)-director (also need to change actors!?)

– Add grid service host id as actor parameter: A@GA– Similar for data: myDB@GA

Page 48: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

““C-z ; bf &” – Detach your WF execution!C-z ; bf &” – Detach your WF execution!

• Currently in PTIICurrently in PTII– tight coupling of WF execution and PTII Java client (also Vergil GUI)

• To-do for Kepler:To-do for Kepler:– detaching WF console (Vergil) from a Grid-aware execution engine

Grid-PN Director!

Transport protocolparameter Data location

parameterHost location

parameter

Page 49: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Semantic Type-enabling Ptolemy II Semantic Type-enabling Ptolemy II (OWL – here we go… ;-) (OWL – here we go… ;-)

(Kepler-SEEK; Shawn Bowers)(Kepler-SEEK; Shawn Bowers)

Page 50: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Semantic Type ExtensionsSemantic Type Extensions

• Take concepts and relationships from an ontology to Take concepts and relationships from an ontology to “semantically type” the data-in/out ports“semantically type” the data-in/out ports

• Application: e.g., design support: Application: e.g., design support: – smart/semi-automatic wiring, generation of “massaging actors”

m1

(normalize)p3 p4

Takes Abundance Count

Measurements for Life StagesReturns Mortality Rate Derived

Measurements for Life Stages

Page 51: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Page 52: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Page 53: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Semantic TypesSemantic Types

• The semantic type signatureThe semantic type signature– Type expressions over the (OWL) ontology

m1

(normalize)p3 p4

SemType m1 ::

Observation & itemMeasured.AbundanceCount &

hasContext.appliesTo.LifeStageProperty

->

DerivedObservation & itemMeasured.MortalityRate &

hasContext.appliesTo.LifeStageProperty

Page 54: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Extended Type System Extended Type System (here: OWL Semantic (here: OWL Semantic Types)Types)

SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStagePropertySubstructure association:

XML raw-data =(X)Query=> object model =link => OWL ontology

Page 55: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Programming ExtensionsProgramming Extensions

(some lessons from SciDAC/SSDBM demo)(some lessons from SciDAC/SSDBM demo)

Page 56: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Promoter Identification

Workflowin Ptolemy-II(SSDBM’03)

hand-crafted control solution; also: forces sequential execution!

designed to fit

designed to fit

hand-craftedWeb-service

actor

Complex backward control-flow

No data transformations

available

Page 57: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Promoter Identification Workflow in FPPromoter Identification Workflow in FP

genBankG :: GeneId -> GeneSeqgenBankP :: PromoterId -> PromoterSeqblast :: GeneSeq -> [PromoterId]promoterRegion :: PromoterSeq -> PromoterRegiontransfac :: PromoterRegion -> [TFBS]gpr2str :: (PromoterId, PromoterRegion) -> String

d0 = Gid "7" -- start with some gene-id d1 = genBankG d0 -- get its gene sequence from GenBankd2 = blast d1 -- BLAST to get a list of potential promotersd3 = map genBankP d2 -- get list of promoter sequences d4 = map promoterRegion d3 -- compute list of promoter regions and ...d5 = map transfac d4 -- ... get transcription factor binding sitesd6 = zip d2 d4 -- create list of pairs promoter-id/regiond7 = map gpr2str d6 -- pretty print into a list of strings d8 = concat d7 -- concat into a single "file" d9 = putStr d8 -- output that file

Page 58: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Cleaned up Process Network PIWCleaned up Process Network PIW

• Back to purely functional Back to purely functional dataflow process networkdataflow process network(= also a data streaming model!)

• Re-introducing Re-introducing mapmap((ff) to ) to Ptolemy-II Ptolemy-II (was there in PT (was there in PT Classic) Classic) no control-flow spaghetti data-intensive apps free concurrent execution free type checking automatic support to go from

piw(GeneId) to

PIW :=map(piw) over [GeneId]

map(f)-style

iterators Powerful type

checking Generic,

declarative “programming”

constructs

Generic data transformation

actors

Forward-only, abstractable sub-workflow piw(GeneId)

Page 59: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Optimization by Declarative Rewriting IOptimization by Declarative Rewriting I• PIW as a declarative, PIW as a declarative,

referentially transparent referentially transparent functional processfunctional process optimization via functional

rewriting possiblee.g. map(f o g) = map(f) o map(g)

• Details: Details: – Technical report &PIW specification

in Haskell

map(f o g) instead of map(f) o

map(g)

Combination of map and zip

http://kbi.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdfhttp://kbi.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdf

Page 60: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Optimizing II: Streams & PipelinesOptimizing II: Streams & Pipelines

• Clean functional semantics facilitates Clean functional semantics facilitates algebraic workflow (program) algebraic workflow (program) transformationstransformations (Bird-Meertens); e.g. mapS (Bird-Meertens); e.g. mapS ff •• mapS mapS gg mapS ( mapS (f f •• g g) )

Source: Real-Time Signal Processing: Dataflow, Visual, and Functional Programming, Hideki

John Reekie, University of Technology, Sydney

Source: Real-Time Signal Processing: Dataflow, Visual, and Functional Programming, Hideki

John Reekie, University of Technology, Sydney

Page 61: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

SummarySummary

• Many (most of ours anyways) scientific workflows are Many (most of ours anyways) scientific workflows are dataflowsdataflows– lots of workflow “standards” (messy and not focused on SWF problems)

– should we start a new wave of dataflow standards??

• Importance of clear semantics for Importance of clear semantics for – different MoCs (models of computation: PN, SDF, DE, CT, …)

– component composition across MoCs

– component interaction Ptolemy II directors

• Kepler: Kepler: – Based on extensible Ptolemy II system

– Cross-project activity (SEEK, SDM, Ptolemy II, GEON, BIRN, and counting)

– Plug-in / interface with your SWF planner, execution engine, grid-WF tool!

Page 62: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

Your Projects & Icons <HERE>Your Projects & Icons <HERE>

Page 63: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

A Note on the Style of these SlidesA Note on the Style of these Slides

Due to lack of time, most of the following slides are “by reference” only ;-) Due to lack of time, most of the following slides are “by reference” only ;-)

– …Each speaker was given four minutes to present his paper, as there were so many scheduled -- 198 from 64 different countries. To help expedite the proceedings, all reports had to be distributed and studied beforehand, while the lecturer would speak only in numerals, calling attention in this fashion to the salient paragraphs of his work. ... Stan Hazelton of the U.S. delegation immediately threw the hall into a flurry by emphatically repeating: 4, 6, 11, and therefore 22; 5, 9, hence 22; 3, 7, 2, 11, from which it followed that 22 and only 22!! Someone jumped up, saying yes but 5, and what about 6, 18, or 4 for that matter; Hazelton countered this objection with the crushing retort that, either way, 22. I turned to the number key in his paper and discovered that 22 meant the end of the world… [The Futurological Congress, Stanislaw Lem, translated from the Polish by Michael Kandel, Futura 1977]

Page 64: Bertram Lud ä scher San Diego Supercomputer Center ludaesch@SDSC

NeSCR Dec-3 -2003 Bertram Ludaescher

F I N: Words to/from the WiseF I N: Words to/from the WiseFYI: Flow-based programming has been re-discovered/re-invented several times by FYI: Flow-based programming has been re-discovered/re-invented several times by different communities. Here is an “IBM practitioner’s view”:different communities. Here is an “IBM practitioner’s view”:

– Flow-based Programming, http://www.jpaulmorrison.com/fbp/… In "Flow-Based Programming" (FBP), applications are defined as networks of "black box" processes, which exchange data across predefined connections. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. It is thus naturally component-oriented. To describe this capability, the distinguished IBM engineer, Nate Edwards, coined the term "configurable modularity", which he calls the basis of all true engineered systems. When using FBP, the application developer works with flows of data, being processed asynchronously, rather than the conventional single hierarchy of sequential, procedural code.   It is thus a good fit with multiprocessor computers, and also with modern embedded software. In many ways, an FBP application resembles more closely a real-life factory, where items travel from station to station, undergoing various transformations.  Think of a soft drink bottling factory, where bottles are filled at one station, capped at the next and labelled at yet another one.  FBP is therefore highly visual: it is quite hard to work with an FBP application without having the picture laid out on one's desk, or up on a screen!  For an example, see Sample DrawFlow Diagram. Strangely though, in spite of being at the leading edge of application development, it is also simple enough that trainee programmers can pick it up, and it is a much better match with the primitives of data processing than the conventional primitives of procedural languages. The key, of course (and perhaps the reason why it hasn't caught on more widely), is that it involves a significant paradigm shift that changes the way you look at programming, and once you have made this transition, you find you can never go back! FBP seems to dovetail neatly with a concept that I call "smart data". There is a section on this in stuff about the author. A new web page on this topic has just been uploaded - see "Smart Data" and Business Data Types - and we will be publishing more as it develops. …