Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat...

59
Kepler: Towards a Grid-Enabled Kepler: Towards a Grid-Enabled System for Scientific Workflows System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher* , Steve Mock *[email protected] San Diego Supercomputer Center (SDSC) University of California, San Diego (UCSD)

Transcript of Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat...

Page 1: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

Kepler: Towards a Grid-Enabled Kepler: Towards a Grid-Enabled System for Scientific System for Scientific

WorkflowsWorkflowsIlkay Altintas, Chad Berkley, Efrat Jaeger,

Matthew Jones, Bertram Ludäscher* , Steve Mock

*[email protected] Diego Supercomputer Center (SDSC)University of California, San Diego (UCSD)

Page 2: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 2

Outline

• Motivation: Scientific Workflows (SEEK, SDM, GEON, ..)

• Current Features of the Kepler Scientific Workflows System

• Extending Kepler:– Grid-Enabling Kepler:

• 3rd party transfer

– WF planning & optimization• Shipping and Handling Algebra (SHA)• Web Service Composition as Declarative Query Plans

– Semantic Types for Scientific Workflows

• Conclusions

Page 3: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 3

Kepler Team, Projects, Sponsors

• Ilkay Altintas SDM • Chad Berkley SEEK • Shawn Bowers SEEK• Jeffrey Grethe BIRN• Christopher H. Brooks Ptolemy II • Zhengang Cheng SDM • Efrat Jaeger GEON • Matt Jones SEEK • Edward A. Lee Ptolemy II • Kai Lin GEON• Bertram Ludäscher BIRN, GEON, SDM, SEEK• Steve Mock NMI• Steve Neuendorffer Ptolemy II • Jing Tao SEEK• Mladen Vouk SDM • Yang Zhao Ptolemy II • …

Ptolemy IIPtolemy II

                                                

                                            

Page 4: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 4

Example: SEEK – Science Environment for Ecological Knowledge (large NSF ITR)

• Analysis & Modeling System– Design and execution of

ecological models and analysis

– End user focus– application-/upperware

• Semantic Mediation System– Data Integration of hard-

to-relate sources and processes

– Semantic Types and Ontologies

– upper middleware• EcoGrid

– Access to ecology data and tools

– middle-/underware

Architecture Overview(cf. Cyberinfrastructure)

Page 5: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 5

Ecology: GARP Analysis Pipeline for Invasive Species Prediction

Training sample

(d)

GARPrule set

(e)

Test sample (d)

Integrated layers

(native range) (c)

Speciespresence &

absence points(native range)

(a)EcoGridQuery

EcoGridQuery

LayerIntegration

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Validation

MapGeneration

Integrated layers (invasion area) (c)

Species presence &absence points

(invasion area) (a)

Native range

predictionmap (f)

Model qualityparameter (g)

Environmental layers (native

range) (b)

GenerateMetadata

ArchiveTo Ecogrid

RegisteredEcogrid

Database

RegisteredEcogrid

Database

RegisteredEcogrid

Database

RegisteredEcogrid

Database

Environmental layers (invasion

area) (b)

Invasionarea prediction

map (f)

Model qualityparameter (g)

Selectedpredictionmaps (h)

Source: NSF SEEK (Deana Pennington et. al, UNM)Source: NSF SEEK (Deana Pennington et. al, UNM)

Page 6: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 6

Genomics Example: Promoter Identification

Workflow (PIW)

Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)

Page 7: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 7

Source: NIH BIRN (Jeffrey Grethe, UCSD)Source: NIH BIRN (Jeffrey Grethe, UCSD)

Page 8: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 8

Scientific “Workflows”: Some Findings

• More dataflow than (business control-/) workflow– DiscoveryNet, Kepler, SCIRun, Scitegic, Taverna, Triana,, …,

• Need for “programming extension” – Iterations over lists (foreach); filtering; functional composition;

generic & higher-order operations (zip, map(f), …)• Need for abstraction and nested workflows• Need for data transformations (WS1DTWS2)• Need for rich user interaction & workflow steering:

– pause / revise / resume– select & branch; e.g., web browser capability at specific steps

as part of a coordinated SWF• Need for high-throughput transfers (“grid-enabling”,

“streaming”)• Need for persistence of intermediate products and

provenance

Page 9: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 10

In a Flux: Workflow “Standards”

Source: W.M.P. van der Aalst et al. http://tmitwww.tm.tue.nl/research/patterns/http://tmitwww.tm.tue.nl/staff/wvdaalst/Publications/publications.htmlSource: W.M.P. van der Aalst et al. http://tmitwww.tm.tue.nl/research/patterns/http://tmitwww.tm.tue.nl/staff/wvdaalst/Publications/publications.html

Page 10: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 11

Commercial & Open Source Scientific “Workflow” (well Dataflow)

Systems

Kensington Discovery Edition from InforSense

Taverna

Triana

Page 11: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 12

SCIRun: Problem Solving Environments for Large-Scale Scientific Computing

• SCIRun: PSE for interactive construction, debugging, and steering of large-scale scientific computations

• New collaboration under Kepler/SDM • Component model, based on generalized dataflow programming

Steve Parker (cs.utah.edu)Steve Parker (cs.utah.edu)

Page 12: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

Our Starting Point: Ptolemy II & Dataflow Process Networks

see!see!see!see!

try!try!try!try!

read!read!read!read!

Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/

Page 13: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 14

Why Ptolemy II?

• Ptolemy II Objective:– “The focus is on assembly of concurrent components. The key

underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation.”

• Data & Process oriented: Dataflow process networks • Natural Data Streaming Support• User-Orientation

– “application-ware”, not middle-/under-ware)– Workflow design & exec console (Vergil GUI)

• PRAGMATICS– mature, actively maintained, well-documented (500+pp)– open source system– developed across multiple projects (NSF/ITRs SEEK and GEON, DOE

SciDAC SDM, …)– hoping to leverage e-sister projects (e.g. Taverna, …)

Page 14: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 15

Dataflow Process Networks: Putting Computation Models (“Orchestration”) first!

• Synchronous Dataflow Network (SDF)– Statically schedulable single-threaded dataflow

• Can execute multi-threaded, but the firing-sequence is known in advance– Maximally well-behaved, but also limited expressiveness

• Process Network (PN)– Multi-threaded dynamically scheduled dataflow– More expressive than SDF (dynamic token rate prevents static

scheduling)– Natural streaming model

• Other Execution Models (“Domains”)– Implemented through different “Directors”

actor actor

typed i/o ports

FIFO

advanced push/pull

Page 15: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 16Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/

Actor-/Dataflow Orientation

vsObject-/

Control flow Orientation

Page 16: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 17

Marrying or Divorcing Control- & Dataflow

Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/

Page 17: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 18

Overview: Scientific Workflows in Kepler

• Modeling and Workflow Design

• Web services = individual components (“actors”)

• “Minute-Made” Application Integration: – Plugging-in and harvesting web service components is easy, fast

• Rich SWF modeling semantics (“directors”):– Different and precise dataflow models of computation– Clear and composable component interaction semantics Web service composition and application integration tool

• Coming soon:– Shrinked wrapped, pre-packaged “Kepler-to-Go” – Structural and semantic typing (better design support)– Grid-enabled web services (for big data, big computations,…) – Different deployment models (web service, web site, applet, …)

Page 18: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 19

The KEPLER GUI: Vergil(Steve Neuendorffer, Ptolemy II)

Drag and drop utilities, director and actor libraries.

Page 19: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 20

Running a Genomics WF (Ilkay Altintas, SDM)

Page 20: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 21

Support for Multiple Workflow Granularities

Boulders

Abstraction:Sand to Rocks

Sand

Powder

Plumbing

Page 21: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 22

Directors and Combining Different Component Interaction Semantics

Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/

Page 22: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 23

Application Examples: Mineral Classification with Kepler … (Efrat Jaeger, GEON)

Page 23: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 24

… inside the Classifier

Page 24: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 25

Standard BrowserUI: Client-Side SVG

Page 25: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 26

SWF Reengineering (Ashraf, Efrat, Kai, GEON)

Page 26: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 27

DataMapper Sub-Workflow

Page 27: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 28

Result launched via BrowserUI actor(coupling with ESRI’s ArcIMS)

Page 28: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 29

Distributed Workflows in KEPLER

• Web and Grid Service plug-ins– WSDL (now) and Grid services (stay tuned …)– ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard– SSH, SCP, SDSC SRB, OGS?-???… coming

• WS Harvester– Import query-defined WS operations as Kepler actors

• XSLT and XQuery Data Transformers– to link not “designed-to-fit” web services

• WS-deployment interface (planned)

Page 29: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 30

Generic Web Service Actor (Ilkay Altintas)

Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method.

Configure - select service operation

Page 30: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 31

Set Parameters and Commit

Set parameters and commit

Page 31: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 32

Specialized WS Actor (after instantiation)

Page 32: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 33

Web Service Harvester (Ilkay Altintas, SDM)

• Imports the web services in a repository into the actor library.• Has the capability to search for web services based on a keyword.

Page 33: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 34

Composing 3rd-Party WSs (NMI, Steve Mock)

Output of previousweb service

User interaction &Transformations

Input of next web service

Page 34: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 35

A Special Generic Ingestion Actor for EML Data (SEEK, Chad Berkley)

Ingests any data format described by EML metadata

Converts raw data to Ptolemy format

Data can then be operated on with other actors

Page 35: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 36

Wrapping Legacy Applications

Page 36: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 37

Promoter Identification Workflow (PIW)

Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)

Page 37: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 38

Promoter Identification

Workflowin Ptolemy-II[SSDBM’03]

ExecutionSemantics

Page 38: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 39

hand-crafted control solution; also: forces sequential execution!

designed to fit

designed to fit

hand-craftedWeb-service

actor

Complex backward control-flow

No data transformations

available

Page 39: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 40

Promoter Identification Workflow in FP

genBankG :: GeneId -> GeneSeqgenBankP :: PromoterId -> PromoterSeqblast :: GeneSeq -> [PromoterId]promoterRegion :: PromoterSeq -> PromoterRegiontransfac :: PromoterRegion -> [TFBS]gpr2str :: (PromoterId, PromoterRegion) -> String

d0 = Gid "7" -- start with some gene-id d1 = genBankG d0 -- get its gene sequence from GenBankd2 = blast d1 -- BLAST to get a list of potential promotersd3 = map genBankP d2 -- get list of promoter sequences d4 = map promoterRegion d3 -- compute list of promoter regions and ...d5 = map transfac d4 -- ... get transcription factor binding sitesd6 = zip d2 d4 -- create list of pairs promoter-id/regiond7 = map gpr2str d6 -- pretty print into a list of strings d8 = concat d7 -- concat into a single "file" d9 = putStr d8 -- output that file

Page 40: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 41

Cleaned up Process Network PIW

• Back to purely functional dataflow process network(= also a data streaming

model!)

• Re-introducing map(f) to Ptolemy-II (was there in PT Classic) no control-flow spaghetti data-intensive apps free concurrent execution free type checking automatic support to go from

piw(GeneId) to PIW :=map(piw) over [GeneId]

map(f)-style

iterators Powerful type

checking Generic,

declarative “programming”

constructs

Generic data transformation

actors

Forward-only, abstractable sub-workflow piw(GeneId)

Page 41: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 42

Optimization by Declarative Rewriting I

• PIW as a declarative, referentially transparent functional process optimization via functional

rewriting possiblee.g. map(f o g) = map(f) o map(g)

• Technical report &PIW specification in Haskell

map(f o g) instead of map(f) o

map(g)

Combination of map and zip

http://kbis.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdfhttp://kbis.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdf

Page 42: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 43

Optimizing II: Streams & Pipelines

• Clean functional semantics facilitates algebraic workflow (program) transformations (Bird-Meertens); e.g. mapS f • mapS g mapS (f • g)

Source: Real-Time Signal Processing: Dataflow, Visual, and Functional Programming, Hideki

John Reekie, University of Technology, Sydney

Source: Real-Time Signal Processing: Dataflow, Visual, and Functional Programming, Hideki

John Reekie, University of Technology, Sydney

Page 43: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

Middle/Underware Access: Querying Databases

• Database connection actor: – Opening a database connection and passing it to all actors

accessing this database.

• Database query actor:– A generic actor that queries a database and provides its

result.

• DBConnection type and DBConnectionToken:– A new IOPort type and a token to distinguish a database

connection from any general type.

Page 44: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

Database Connection Actor

• OpenDBConnection actor: – Input: database connection information– Output: DBConnectionToken (reference to a DB

connection instance, via a DBConnection output port)

Page 45: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

Database Query Actor

• Database Query actor: – Input: SQL query string and a DB connection token– Parameters:

• output type: XML, Record, or String• tuple-at-a-time vs set-at-a-time

– Process: • execute query• produce results according to parameters

Page 46: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

Querying Example

Page 47: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 48

An (oversimplified) Model of the Grid

• Hosts: {h1, h2, h3, …}

• Data@Hosts: d1@{hi}, d2@{hj}, …

• Functions@Hosts: f1@{hi}, f2@{hj}, …

• Given: data/workflow:• … as a functional plan: […; Y := f(X); Z := g(Y); …] • … as a logic plan: […; f(X,Y)g(Y,Z); …]

• Find Host Assignment: di hi , fj hj for all di , fj

… s.t. […; d3@h3 := f@h2(d1@h1), …] is a valid plan

f gX Y Z

Page 48: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 49

Shipping and Handling Algebra (SHA)

f@A

x@b y@c

f@A

x@b y@c

f@A

x@b y@c

f@A

x@b y@c

plan Y@C = F@A of X@B =

1. [ X@B to A, Y@A := F@A(X@A), Y@A to C ]

2. [ F@A => B, Y@B := F@B(X@B), Y@B to C ]

3. [ X@B to C, F@A => C, Y@C := F@C(X@C) ]

Logical view

Physical view: SHA Plans

(1)

(3)

(2)

Page 49: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 50

Grid-Enabling PTII: Handles

A B

GA GB

1. AGA: get_handle2. GAA: return &X3. AB: send &X4. BGB: request &X5. GBGA: request &X6. GA GB: send *X7. GBB: send done(&X)

Example: &X = “GA.17”

*X =<some_huge_file>

Candidate Formalisms:• GridFTP• SSH, SCP• SDSC SRB• OGS?-??? … WSRF?

1 2

3

4

5

6

7

Kepler space

Grid space

Logical token transfer (3) requires get_handle(1,2); then exec_handle(4,5,6,7) for completion.

Page 50: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 51

Extensions: Semantic Type

• Take concepts and relationships from an ontology to “semantically type” the data-in/out ports

• Application: e.g., design support: – smart/semi-automatic wiring, generation of “massaging

actors”

m1

(normalize)p3 p4

Takes Abundance Count

Measurements for Life StagesReturns Mortality Rate Derived

Measurements for Life Stages

Page 51: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 52

Page 52: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 53

Page 53: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 54

Semantic Types

• The semantic type signature– Type expressions over the (OWL) ontology

m1

(normalize)p3 p4

SemType m1 ::

Observation & itemMeasured.AbundanceCount &

hasContext.appliesTo.LifeStageProperty

->

DerivedObservation & itemMeasured.MortalityRate &

hasContext.appliesTo.LifeStageProperty

Page 54: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 55

Extended Type System (here: OWL Semantic Types)

SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStagePropertySubstructure association:

XML raw-data =(X)Query=> object model =link => OWL ontology

Page 55: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 56

Semantic Types for Scientific Workflows

Page 56: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 57

Deriving Data Transformations from Semantic Service Registration

[Bowers-Ludaescher,DILS’04]

Page 57: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 58

Structural and Semantic Mappings

[Bowers-Ludaescher,DILS’04]

Page 58: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 59

Workflow Planning as Planning Queries with Limited Access Patterns• User query Q: answer(ISBN, Author, Title)

book(ISBN, Author, Title),catalog(ISBN, Author),not library(ISBN).

• Limited (web service) Access Patterns (API)– Src1.books: in: ISBN out: Author, Title– Src1.books: in: Author out: ISBN, Title– Src2.catalog: in: {} out: ISBN, Author– Src3.library: in: {} out: ISBN

• Q is not executable, but feasible (equivalent to executable Q’: catalog ; book ; not library)

ICDE (poster), EDBT, PODS (papers), [Nash-Ludaescher,2004]

Page 59: Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

B. Ludäscher et al. – Grid-Enabling Kepler 60

Conclusions

• Summary– Kepler Scientific Workflow System– Open source, cross-project collaboration

(SEEK, GEON, SDM,…)– Actor & Dataflow-oriented Modeling, Design,

Execution (Ptolemy II heritage)– Prototyping, static analysis, web services,

data transformations• Next Steps

– First official release (“Kepler-to-Go”) April/May ’04

• e-Science meeting NeSC, Edinburgh– Grid-enabling

• 3rd party transfer, planning, optimization, …– Semantic Typing [DILS’04]– Provenance, Fault tolerance, … – Link-Up w/ e.g. Taverna, Pegasus, …– Become a member or co-developer (You!)