Scientific Data & Workflow Engineeringusers.sdsc.edu/~ludaesch/Paper/ludaescher-nsf-11-04.pdf ·...
Transcript of Scientific Data & Workflow Engineeringusers.sdsc.edu/~ludaesch/Paper/ludaescher-nsf-11-04.pdf ·...
1
Scientific Data & Workflow EngineeringScientific Data & Workflow EngineeringPreliminary Notes from the Preliminary Notes from the CyberinfrastructureCyberinfrastructure TrenchesTrenches
Bertram Bertram LudLudääscherscher
San Diego Supercomputer Center
Associate ProfessorAssociate ProfessorDept. of Computer Science & Genome CenterDept. of Computer Science & Genome Center
University of California, DavisUniversity of California, Davis
FellowFellowSan Diego Supercomputer CenterSan Diego Supercomputer Center
University of California, San DiegoUniversity of California, San Diego
UC DAVISDepartment ofComputer Science
2 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
OutlineOutline
•• Introduction: CI Sample ArchitecturesIntroduction: CI Sample Architectures
•• Scientific Data IntegrationScientific Data Integration
•• Scientific Workflow ManagementScientific Workflow Management
•• Links & Crystallization PointsLinks & Crystallization Points
•• SummarySummary
2
3 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Science Environment for Ecological Science Environment for Ecological Knowledge (SEEK) OverviewKnowledge (SEEK) Overview
•• Domain Science DriverDomain Science Driver– Ecology (LTER), biodiversity, …
•• Analysis & Modeling SystemAnalysis & Modeling System– Design & execution of ecological
models & analysis (“scientific workflows”)
– {application,upper}-wareKepler system
•• Semantic Mediation SystemSemantic Mediation System– Data Integration of hard-to-
relate sources and processes– Semantic Types and Ontologies– upper middleware
Sparrow Toolkit•• EcoGridEcoGrid
– Access to ecology data and tools– {middle,under}-ware
unified API to SRB/MCAT, MetaCat, DiGIR, … datasets sample CS problem [DILS’04]
4 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Common CI Infrastructure PiecesCommon CI Infrastructure Pieces•• Other CIOther CI--projects (e.g. GEON, projects (e.g. GEON, …… ) have similar ) have similar
serviceservice--oriented architectures: oriented architectures: – Seamless and uniform data access (“Data-Grid”)
• data & metadata registry– distributed and high performance computing
platform (“Compute-Grid”)• service registry
– Federated, integrated, mediated databases• often use of semantic extensions (e.g. ontologies)
– User-friendly workbench / problem-solving environment
scientific workflows
•• add to this sensors, observing systems add to this sensors, observing systems ……
3
5 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
…… Example: Example: RealtimeRealtime Environment for Environment for Analytical Processing Analytical Processing (REAP vision)(REAP vision)
6 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
The Great Unified SystemThe Great Unified System
•• Many engineering and CS challenges!Many engineering and CS challenges!… we’ll see some …
•• Our focus:Our focus:– Scientific data integration
• How to associate, mediate, integrate complex scientific data?– Scientific workflows
• How to devise larger scientific workflows for process automation from individual components (e.g. web services)?
•• Disclaimer:Disclaimer:… often scratching the surface; see references &
research literature for details …
4
7 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
OutlineOutline
•• Introduction: CI Sample ArchitecturesIntroduction: CI Sample Architectures
•• Scientific Data IntegrationScientific Data Integration
•• Scientific Workflow ManagementScientific Workflow Management
•• Links & Crystallization PointsLinks & Crystallization Points
•• SummarySummary
8 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
An Online ShopperAn Online Shopper’’s Information Integration s Information Integration ProblemProblem
El Cheapo: El Cheapo: ““Where can I get the cheapest copy (including shipping cost) of Where can I get the cheapest copy (including shipping cost) of WittgensteinWittgenstein’’s s TractatusTractatus LogicusLogicus--PhilosophicusPhilosophicus within a week?within a week?””
??Information Information IntegrationIntegration
addall.comaddall.com
“One-World”Mediation
amazon.comamazon.comamazon.com A1books.comA1books.comA1books.comhalf.comhalf.comhalf.combarnes&noble.combarnes&noble.combarnes&noble.com
Mediator (virtual DB)Mediator (virtual DB)(vs. (vs. DatawarehouseDatawarehouse))NOTE: nonNOTE: non--trivial trivial
data engineering challenges!data engineering challenges!
5
9 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
A Home BuyerA Home Buyer’’s Information Integration Problems Information Integration Problem
What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood
with below-average crime rate and diverse population?
?Information Integration
RealtorRealtor DemographicsDemographicsSchool RankingsSchool RankingsCrime StatsCrime Stats
“Multiple-Worlds”Mediation
10 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
A NeuroscientistA Neuroscientist’’s Information s Information Integration ProblemIntegration ProblemWhat is the cerebellar distribution of rat proteins with more than
70% homology with human NCS-1? Any structure specificity?How about other rodents?
?Information Integration
protein localization(NCMIR)
neurotransmission(SENSELAB)
sequence info(CaPROT)
morphometry(SYNAPSE)
“Complex Multiple-Worlds”
Mediation
Biomedical InformaticsBiomedical InformaticsResearch NetworkResearch Networkhttp://http://nbirn.netnbirn.net
Inter-source links:• unclear for the non-scientists• hard for the scientist
6
11 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
12 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Interoperability & Integration ChallengesInteroperability & Integration Challenges• System aspects: “Grid” Middleware
• distributed data & computing, SOA• web services, WSDL/SOAP, WSRF, OGSA, …• sources = functions, files, data sets …
• Syntax & Structure: (XML-Based) Data Mediators
• wrapping, restructuring • (XML) queries and views• sources = (XML) databases
• Semantics: Model-Based/Semantic Mediators
• conceptual models and declarative views • Knowledge Representation: ontologies,
description logics (RDF(S),OWL ...)• sources = knowledge bases (DB+CMs+ICs)
• Synthesis: Scientific Workflow Design & Execution
• Composition of declarative and procedural components into larger workflows
• (re)sources = services, processes, actors, …
reconciling reconciling SS55
heterogeneitiesheterogeneities““gluinggluing”” together together resources resources bridging information and bridging information and knowledge gaps knowledge gaps computationallycomputationally
7
13 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Information Integration Challenges: Information Integration Challenges: SS44 HeterogeneitiesHeterogeneities•• SSystemystem aspectsaspects
– platforms, devices, data & service distribution, APIs, protocols, …Grid middleware technologies
+ e.g. single sign-on, platform independence, transparent use of remote resources, …
•• SSyntaxyntax & & SStructuretructure– heterogeneous data formats (one for each tool ...)– heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs, flat files, …) – heterogeneous schemas (one for each DB ...)
Database mediation technologies+ XML-based data exchange, integrated views, transparent query rewriting, …
•• SSemanticsemantics– descriptive metadata, different terminologies, “hidden” semantics
(context), implicit assumptions, …Knowledge representation & semantic mediation technologies
+ “smart” data discovery & integration+ e.g. ask about X (‘mafic’); find data about Y (‘diorite’); be happy anyways!
14 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Information Integration Challenges: Information Integration Challenges: SS55 HeterogeneitiesHeterogeneities•• SSynthesisynthesis of applications, analysis tools, data & of applications, analysis tools, data &
query components, query components, …… into into ““scientific workflowsscientific workflows””– How to make use of these wonderful things & put them
together to solve a scientist’s problem?Scientific Problem Solving Environments (Scientific Problem Solving Environments (PSEsPSEs))
Portals,Workbench (“scientist’s view”)+ ontology-enhanced data registration, discovery,
manipulation+ creation and registration of new data products from
existing ones, …Scientific Workflow System (“engineer’s view”)
+ for designing, re-engineering, deploying analysis pipelines and scientific workflows; a tool to make new tools …
+ e.g., creation of new datasets from existing ones, dataset registration, …
8
15 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Information Integration from a Information Integration from a Database Perspective Database Perspective
•• Information Integration ProblemInformation Integration Problem– Given: data sources S1, ..., Sk (databases, web sites, ...)
and user questions Q1,..., Qn that can –in principle– be answered using the information in the Si
– Find: the answers to Q1, ..., Qn
•• The Database Perspective: The Database Perspective: source = source = ““databasedatabase””⇒ Si has a schema (relational, XML, OO, ...) ⇒ Si can be queried⇒ define virtual (or materialized) integrated (or global)
view G over local sources S1 ,..., Sk using database query languages (SQL, XQuery,...)
⇒ questions become queries Qi against G(S1,..., Sk)
16 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
5. Post processing5. Post processing2. Query rewriting2. Query rewriting
Standard (XMLStandard (XML--Based) Mediator ArchitectureBased) Mediator Architecture
MEDIATORMEDIATOR
Integrated Global(XML) View G
Integrated ViewDefinition
G(..)← S1(..)…Sk(..)
USER/ClientUSER/Client
1. Query Q ( G (S1. Query Q ( G (S11,..., ,..., SSkk) )) )
S1
Wrapper
(XML) View
S2
Wrapper
(XML) View
Sk
Wrapper
(XML) Viewweb services as wrapper APIs
3. Q1 Q2 Q33. Q1 Q2 Q34. {answers(Q1)} {answers(Q2)} 4. {answers(Q1)} {answers(Q2)} {answers(Q3)}{answers(Q3)}
6. {answers(Q)}6. {answers(Q)}
9
17 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Query Planning in Data IntegrationQuery Planning in Data Integration•• GivenGiven: :
– Declarative user query Q: answer(…) …G ...– … & { G … S … } global-as-view (GAV)– … & { S … G … } local-as-view (LAV)– … & { ic(…) … S … G… } integrity constraints (ICs)
•• FindFind: : – equivalent (or minimal containing, maximal contained)
query plan Q’: answer(…) … S …query rewriting (logical/calculus, algebraic, physical levels)
•• ResultsResults::– A variety of results/algorithms; depending on classes of
queries, views, and ICs: P, NP, … , undecidable– hot research area in core CS (database community)
18 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Scientific Data Integration using Scientific Data Integration using Semantic ExtensionsSemantic Extensions
10
19 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
20 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Example: Geologic Map IntegrationExample: Geologic Map Integration
•• GivenGiven: : – Geologic maps from different state geological
surveys (shapefiles w/ different data schemas)– Different ontologies:
• Geologic age ontology (e.g. USGS)• Rock classification ontologies:
– Multiple hierarchies (chemical, fabric, texture, genesis) from Geological Survey of Canada (GSC)
– Single hierarchy from British Geological Survey (BGS)
•• ProblemProblem::– Support uniform queries across all map – … using different ontologies– Support registration w/ ontology A, querying w/
ontology B
11
21 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
IntegrationSchema
Schema Integration Schema Integration ((““registeringregistering”” local local schemas to the global schema)schemas to the global schema)
Arizona
Colorado
Utah
Nevada
Wyoming
New Mexico
Montana E.
Idaho
Montana West
……AgeAge
……FormationFormation
……AgeAge
……FormationFormation
……AgeAge
……FormationFormation
……AgeAge
……FormationFormation
……AgeAge
……FormationFormation
……AgeAge
……FormationFormation
……AgeAge
……FormationFormation
TextureTexture……
FabricFabric……
CompositionComposition……
AgeAge……
FormationFormation……
TextureTexture……
FabricFabric……
CompositionComposition……
AgeAge……
FormationFormation……
ABBREV
PERIOD
PERIOD
NAME
PERIOD
TYPE
TIME_UNIT
FMATN
PERIOD
NAME
PERIOD
NAME
FORMATION
PERIOD
FORMATION
FORMATION
LITHOLOGY
LITHOLOGY
AGE
AGE
andesitic sandstone
Livingston formation
Tertiary-Cretaceous
Sources
Sources
22 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
MultihierarchicalMultihierarchical Rock Classification Rock Classification ““OntologyOntology””(Taxonomies) for (Taxonomies) for ““Thematic QueriesThematic Queries”” (GSC)(GSC)
Composition
Genesis
Fabric
Texture
12
23 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
OntologyOntology--Enabled Application Example:Enabled Application Example:Geologic Map IntegrationGeologic Map Integration
Show formations where AGE = ‘Paleozic’(without age ontology)
Show formations where AGE = ‘Paleozic’(without age ontology)
Show formations where AGE = ‘Paleozic’
(with age ontology)
Show formations where AGE = ‘Paleozic’
(with age ontology)
+/- a few hundred million years
domainknowledgedomain
knowledge
Knowledge re
presentation
Geologi
c Age
ONTOLO
GY
NevadaNevada
24 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Querying by Geologic Age Querying by Geologic Age ……
13
25 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Querying by Geologic Age: ResultsQuerying by Geologic Age: Results
26 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Querying by Chemical Composition Querying by Chemical Composition …… (GSC) (GSC)
14
27 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Semantic Mediation Semantic Mediation (via (via ““semantic semantic registrationregistration”” of schemas and ontology articulations)of schemas and ontology articulations)
•• Schema elements and/or data values are associated Schema elements and/or data values are associated with concept expressions from the target ontologywith concept expressions from the target ontology
conceptual queries “through” the ontology•• Articulation ontology Articulation ontology
source registration to A, querying through B•• Semantic mediation: query rewriting w/ Semantic mediation: query rewriting w/ ontologiesontologies
Database1
Database2
Ontology A
Ontology B
semanticregistration
ontology articulations
Concept-based(“semantic”)
queries
semanticregistration
28 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Different views on State Different views on State Geological MapsGeological Maps
15
29 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Sedimentary Rocks: BGS OntologySedimentary Rocks: BGS Ontology
30 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Sedimentary Rocks: GSC OntologySedimentary Rocks: GSC Ontology
16
31 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Implementation in OWL: Not only Implementation in OWL: Not only ““for the for the machinemachine”” ……
32 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Source Contextualization Source Contextualization through Ontology Refinementthrough Ontology Refinement
In addition to registering(“hanging off”) data relative to
existing concepts, a source may also refine the mediator’s domain map...
⇒ sources can register new concepts at the mediator ...
17
33 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
OutlineOutline
•• Introduction: CI Sample ArchitecturesIntroduction: CI Sample Architectures
•• Scientific Data IntegrationScientific Data Integration
•• Scientific Workflow ManagementScientific Workflow Management
•• Links & Crystallization PointsLinks & Crystallization Points
•• SummarySummary
34 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
What is a Scientific Workflow (SWF)?What is a Scientific Workflow (SWF)?•• GoalsGoals::
– automate a scientist’s repetitive data management and analysis tasks
– typical phases: • data access, scheduling, generation, transformation,
aggregation, analysis, visualizationdesign, test, share, deploy, execute, reuse, … SWFs
18
35 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Promoter Identification WorkflowPromoter Identification Workflow
Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)
36 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Source: NIH BIRN (Jeffrey Grethe, UCSD)Source: NIH BIRN (Jeffrey Grethe, UCSD)
19
37 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Ecology: GARP Analysis Pipeline Ecology: GARP Analysis Pipeline for Invasive Species Predictionfor Invasive Species Prediction
Training sample
(d)
GARPrule set
(e)
Test sample (d)
Integratedlayers
(native range) (c)
Speciespresence &
absence points(native range)
(a)EcoGridQuery
EcoGridQuery
LayerIntegration
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Validation
MapGeneration
Integrated layers (invasion area) (c)
Species presence &absence points
(invasion area) (a)
Native range
predictionmap (f)
Model qualityparameter (g)
Environmental layers (native
range) (b)
GenerateMetadata
ArchiveTo Ecogrid
RegisteredEcogrid
Database
RegisteredEcogrid
Database
RegisteredEcogrid
Database
RegisteredEcogrid
Database
Environmental layers (invasion
area) (b)
Invasionarea prediction
map (f)
Model qualityparameter (g)
Selectedpredictionmaps (h)
Source: NSF SEEK (Deana Pennington et. al, UNM)Source: NSF SEEK (Deana Pennington et. al, UNM)
38 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
20
39 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Commercial & Open Source Commercial & Open Source Scientific Scientific ““WorkflowWorkflow”” (well (well DataflowDataflow) Systems) Systems
Kensington Discovery Edition from InforSense
Taverna
Triana
40 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
SCIRunSCIRun: Problem Solving Environments for : Problem Solving Environments for LargeLarge--Scale Scientific ComputingScale Scientific Computing
•• SCIRunSCIRun: PSE for interactive construction, debugging, : PSE for interactive construction, debugging, and steering of largeand steering of large--scale scientific computationsscale scientific computations
•• Component model, based on generalized dataflow Component model, based on generalized dataflow programmingprogramming Steve Parker (cs.utah.edu)Steve Parker (cs.utah.edu)
21
41 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Ptolemy II
see!see!see!
try!try!try!
read!read!read!
Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/
42 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Why Ptolemy II (and thus KEPLER)?Why Ptolemy II (and thus KEPLER)?•• Ptolemy II Objective:Ptolemy II Objective:
– “The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation.”
•• Dataflow Process Networks w/ natural support for Dataflow Process Networks w/ natural support for abstractionabstraction, , pipeliningpipelining (streaming) (streaming) actoractor--orientation, orientation, actor actor reusereuse
•• UserUser--OrientationOrientation– Workflow design & exec console (Vergil GUI)– “Application/Glue-Ware”
• excellent modeling and design support• run-time support, monitoring, …• not a middle-/underware (we use someone else’s, e.g. Globus, SRB, …)• but middle-/underware is conveniently accessible through actors!
•• PRAGMATICSPRAGMATICS– Ptolemy II is mature, continuously extended & improved, well-documented
(500+pp) – open source system– Ptolemy II folks actively participate in KEPLER
22
43 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
KEPLER/KEPLER/CSPCSP: : CContributors, ontributors, SSponsors, ponsors, PProjectsrojects(or loosely coupled (or loosely coupled CCommunicating ommunicating SSequential equential PPersons ;ersons ;--))
IlkayIlkay AltintasAltintas SDM, ResurgenceSDM, ResurgenceKim Kim BaldridgeBaldridge Resurgence, NMIResurgence, NMIChad Berkley Chad Berkley SEEKSEEKShawn Bowers Shawn Bowers SEEKSEEKTerence Terence CritchlowCritchlow SDMSDMTobin Fricke Tobin Fricke ROADNetROADNetJeffrey Jeffrey GretheGrethe BIRNBIRNChristopher H. Brooks Christopher H. Brooks Ptolemy IIPtolemy IIZhengangZhengang Cheng Cheng SDMSDMDan Higgins Dan Higgins SEEKSEEKEfratEfrat Jaeger Jaeger GEONGEONMatt Jones Matt Jones SEEKSEEKWerner Krebs, Werner Krebs, EOLEOLEdward A. Lee Edward A. Lee Ptolemy IIPtolemy IIKai Lin Kai Lin GEONGEONBertram Bertram LudaescherLudaescher SEEKSEEK, , GEONGEON, , SDMSDM, , ROADNetROADNet, BIRN, BIRNMark Miller Mark Miller EOLEOLSteve Mock Steve Mock NMINMISteve Steve NeuendorfferNeuendorffer Ptolemy IIPtolemy IIJingJing Tao Tao SEEKSEEKMladenMladen VoukVouk SDMSDMXiaowenXiaowen XinXin SDMSDMYang Zhao Yang Zhao Ptolemy IIPtolemy IIBing Zhu Bing Zhu SEEKSEEK••••••
Ptolemy IIPtolemy II
www.keplerwww.kepler--project.orgproject.org
44 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
KEPLER: An Open CollaborationKEPLER: An Open Collaboration•• Initiated by members from NSF SEEK and DOE SDM/SPA; now Initiated by members from NSF SEEK and DOE SDM/SPA; now
several other projects (GEON, Ptolemy II, EOL, Resurgence/NMI, several other projects (GEON, Ptolemy II, EOL, Resurgence/NMI, ……))•• Open Source (BSDOpen Source (BSD--style license)style license)•• Intensive Communications: Intensive Communications:
– Web-archived mailing lists– IRC (!)
•• CoCo--development: development: – via shared CVS repository– joining as a new co-developer (currently):
• get a CVS account (read-only)• local development + contribution via existing KEPLER member• be voted “in” as a member/co-developer
•• Software & social engineeringSoftware & social engineering– How to better accommodate new groups/communities?– How to better accommodate different usage/contribution models (core
dev … special purpose extender … user)?
23
45 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Ptolemy II/KEPLER GUI (Ptolemy II/KEPLER GUI (VergilVergil))“Directors” define the component interaction & execution semantics
Large, polymorphic component (“Actors”) and Directorslibraries (drag & drop)
46 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
KEPLER/Ptolemy II GUI refinedKEPLER/Ptolemy II GUI refinedOntology based actor
(service) and dataset search
Result Display
24
47 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Web Services Web Services Actors Actors (WS Harvester)(WS Harvester)
12
3
4
“Minute-made” (MM) WS-based application integration• Similarly: MM workflow design & sharing w/o implemented components
48 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Some Recent Actor AdditionsSome Recent Actor Additions
25
49 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
An An ““earlyearly”” example: example: Promoter Identification Promoter Identification
SSDBM, AD 2003SSDBM, AD 2003
• Scientist models application as a “workflow” of connected components (“actors”)
• If all components exist, the workflow can be automated/ executed
• Different directors can be used to pick appropriate execution model (often “pipelined”execution: PN director)
50 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Reengineering a Geoscientist’s Mineral Classification Workflow
26
51 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Job Management (here: NIMROD)Job Management (here: NIMROD)
•J ob management infrastructure in place• Results database: under development• Goal: 1000’s of GAMESS jobs (quantum mechanics) – Fall/Winter’04
52 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
ORB
27
53 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Rapid Web ServiceRapid Web Service--based Prototyping based Prototyping (Here: (Here: ROADNetROADNet Command & Control Services for LOOKING KickCommand & Control Services for LOOKING Kick--Off Off MtgMtg))
Source: Ilkay Altintas, SDM, NLADRROADNet: Vernon, Orcutt et al
Web services: Tony Fountain et al
Source: Ilkay Altintas, SDM, NLADRROADNet: Vernon, Orcutt et al
Web services: Tony Fountain et al
54 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
in KEPLER in KEPLER (w/ editable script)(w/ editable script)
Source: Dan Higgins, Kepler/SEEKSource: Dan Higgins, Kepler/SEEK
28
55 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
in KEPLER in KEPLER (interactive session)(interactive session)
Source: Dan Higgins, Kepler/SEEKSource: Dan Higgins, Kepler/SEEK
56 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Blurring Blurring Design (Design (ToDoToDo)) and Executionand Execution
29
57 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Scientific Workflow ChallengesScientific Workflow Challenges
•• Typical FeaturesTypical Features– data-intensive and/or compute-intensive– plumbing-intensive (consecutive web services won’t fit)– dataflow-oriented– distributed (remote data, remote processing)– user-interaction “in the middle”, …– … vs. (C-z; bg; fg)-ing (“detach” and reconnect)– advanced programming constructs (map(f), zip,
takewhile, …)– logging, provenance, “registering back” (intermediate)
products…
58 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
hand-crafted control solution; also: forces sequential execution!
designed to fit
designed to fit
hand-craftedWeb-service actor
Complex backward control-flow
No data transformations available
[Altintas-et-al-PIW-SSDBM’03][Altintas-et-al-PIW-SSDBM’03]
30
59 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
A Scientific Workflow Problem: Solved A Scientific Workflow Problem: Solved (Computer Scientist(Computer Scientist’’s view)s view)
•• Solution based on Solution based on declarative, functional declarative, functional dataflow process networkdataflow process network(= also a data streaming
model!)
•• HigherHigher--order constructs: order constructs: mapmap((ff) ) ⇒ no control-flow spaghetti⇒ data-intensive apps ⇒ free concurrent execution⇒ free type checking⇒ automatic support to go
from piw(GeneId) to PIW :=map(piw) over [GeneId]
map(f)-styleiterators
Powerful type checking
Generic, declarative “programming”
constructs
Generic data transformation actors
Forward-only, abstractable sub-workflow piw(GeneId)
60 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Promoter Identification Workflow RedesignedPromoter Identification Workflow Redesigned
map(GenbankWS)Input: {“NM_001924”, “NM020375”}Output: {“CAGT…AATATGAC",“GGGGA…CAAAGA“}
31
61 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
A Research Problem: A Research Problem: Optimization by RewritingOptimization by Rewriting
•• Example: PIW as a declarative, Example: PIW as a declarative, referentially transparent functional referentially transparent functional processprocess⇒ optimization via functional rewriting
possiblee.g. map(f o g) = map(f) o map(g)
•• Technical report & PIW specification in Technical report & PIW specification in HaskellHaskell
map(f o g) instead of
map(f) o map(g)
Combination of map and zip
http://kbis.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdfhttp://kbis.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdf
62 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
A KR+DI+Scientific Workflow ProblemA KR+DI+Scientific Workflow Problem•• Services can be Services can be semantically compatiblesemantically compatible, but , but
structurally incompatiblestructurally incompatible
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Incompatible
Compatible
(⋠)
(⊑)
δ(Ps)δ(Ps)δ (≺)
Ontologies (OWL)Ontologies (OWL)
Source: [Bowers-Ludaescher, DILS’04]Source: [Bowers-Ludaescher, DILS’04]
32
63 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
OntologyOntology--Informed Data Informed Data Transformation Transformation ((““StructureStructure--ShimShim””))
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Compatible (⊑)
RegistrationMapping (Output)
RegistrationMapping (Input)
CorrespondenceCorrespondence
Generate δ(Ps)δ(Ps)
Ontologies (OWL)Ontologies (OWL)
Transformation
Source: [Bowers-Ludaescher, DILS’04]Source: [Bowers-Ludaescher, DILS’04]
64 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
OutlineOutline
•• Introduction: CI Sample ArchitecturesIntroduction: CI Sample Architectures
•• Scientific Data IntegrationScientific Data Integration
•• Scientific Workflow ManagementScientific Workflow Management
•• Links & Crystallization PointsLinks & Crystallization Points
•• SummarySummary
33
65 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
LinkLink--Up & Crystallization PointsUp & Crystallization Points•• Shared (Domain) Science Vision, GoalsShared (Domain) Science Vision, Goals
– NVO, SCEC, Human Genome Project, …•• Technology WavesTechnology Waves
– XML, web services, WSRF, Semantic Web (OWL), Portlets, …•• Standards for data exchange, metadata, data access Standards for data exchange, metadata, data access
protocols, protocols, ……– GML, EML, netCDF, HDF, …, ADN, …, DODS/OpenDAP, …– Organizations: W3C, GGF, …,
•• Community Community ontologiesontologies– GO (Gene Ontology), ecoinformatics, seismology, geochemistry, …– … from Saulus to Paulus …
•• Shared Community Tools and Tool CoShared Community Tools and Tool Co--DevelopmentDevelopment– SRB, Globus, …, Kepler, …
66 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Shared Science Vision, Goals: SCEC/CMEShared Science Vision, Goals: SCEC/CME
Simulation of Seismic Wave Simulation of Seismic Wave Propagation of a Magnitude 7.7 Propagation of a Magnitude 7.7 Earthquake on San Andreas Earthquake on San Andreas FaultFault– PIs: Thomas Jordan, Bernard
Minster, Reagan Moore, Carl Kesselman
– Simulation• 240 Processors for 5 days• 47 Terabytes of data generated
– SDSC SAC project optimized code on DataStar parallel computer (both MPI I/O management and checkpointing)
– Future simulation – Increase resolution a factor of 2, implies 1 PB of simulation results, 1000 processors for 20 days
Southern California Earthquake Center / Community Modeling EnvirSouthern California Earthquake Center / Community Modeling Environment onment ProjectProject
Source: Reagan Moore, SDSCSource: Reagan Moore, SDSC
34
67 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Example: NVO Community ProcessesExample: NVO Community Processes
-- created standard data encoding format (FITS image created standard data encoding format (FITS image format)format)
-- made accessible common digital holdings (sky survey images)made accessible common digital holdings (sky survey images)-- defined Uniform Content Descriptors (common metadata defined Uniform Content Descriptors (common metadata
attributes)attributes)-- created standard services (standard access mechanisms to created standard services (standard access mechanisms to
catalogs catalogs and surveys)and surveys)-- created digital library (manage derived data products)created digital library (manage derived data products)-- created portals (for combining services interactively)created portals (for combining services interactively)-- created processing pipelines (for automated processing)created processing pipelines (for automated processing)-- created preservation environmentcreated preservation environment
•• AndAnd…… found a new brown dwarf!found a new brown dwarf!
Source: Reagan Moore, SDSCSource: Reagan Moore, SDSC
68 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Semantic Mediation Semantic Mediation ““WaterfallWaterfall”” ……
Ontologies
Semantic Data,Service Annotation
IterativeDevelopment
Resource Discovery
ResourceIntegration
WorkflowPlanning
WorkflowAnalysis
Source: Shawn Bowers, SEEK AHM’04Source: Shawn Bowers, SEEK AHM’04
35
69 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
GEON Dataset Generation & RegistrationGEON Dataset Generation & Registration(a co(a co--development in KEPLER)development in KEPLER)
Xiaowen (SDM)Edward et al.(Ptolemy)
Yang (Ptolemy)
Efrat(GEON)
Ilkay(SDM)
SQL database access (J DBC)Matt,Chad, Dan et al. (SEEK)
% Makefile$> ant run
% Makefile$> ant run
70 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
KEPLER as a Melting Pot KEPLER as a Melting Pot ……•• A A grassgrass--rootsroots projectproject
– Needed a coalition of the (really!) willing •• IntraIntra--project project linkslinks
– e.g. in SEEK: AMS SMS EcoGrid•• InterInter--projectproject linkslinks
– SEEK ITR, GEON ITR, ROADNet ITRs, DOE SciDACSDM, Ptolemy II, NIH BIRN (coming we hope …), UK eScience myGrid, …
•• InterInter--technologytechnology linkslinks– Globus, SRB, JDBC, web services, soaplab services,
command line tools, R, GRASS, XSLT, …•• InterdisciplinaryInterdisciplinary linkslinks
– CS, IT, domain sciences, … (recently: usability engineer)
36
71 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
OutlineOutline
•• Introduction: CI Sample ArchitecturesIntroduction: CI Sample Architectures
•• Scientific Data IntegrationScientific Data Integration
•• Scientific Workflow ManagementScientific Workflow Management
•• Links & Crystallization PointsLinks & Crystallization Points
•• SummarySummary
72 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Summary/Lessons Learned Summary/Lessons Learned •• Eat your own dogEat your own dog--food (or at least tryfood (or at least try……))
– start using your own (CI) tools early!•• Collaboration toolsCollaboration tools
– CVS repositories (+cvsview, webcvs)– Mailing lists (e.g. mailman googlified) – Bugzilla (detailed tracking of tech. issues & bugs)– WIKI (community authored web resource, e.g. high-level tech.
issues)•• Where is the XYZ repository/registry?Where is the XYZ repository/registry?
– EcoGrid (SEEK) registry, GEON registry, KEPLER actor & datasets repository, …
– UDDI what?•• CI CI ““Melting PotsMelting Pots””: :
– SDSC, NCEAS, LTER, NLADR (w/ NCSA), KU Specify, …– Genome Center@UC Davis (moving in …)
37
73 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Q & AQ & A
74 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Further ReadingFurther Reading
under review – available upon request from [email protected]
38
75 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Related PublicationsRelated Publications•• Semantic Data Registration and IntegrationSemantic Data Registration and Integration•• On Integrating Scientific Resources through Semantic RegistratioOn Integrating Scientific Resources through Semantic Registrationn, S. Bowers, K. Lin, and , S. Bowers, K. Lin, and
B. B. LudLudääscherscher, , 16th International Conference on Scientific and Statistical Data16th International Conference on Scientific and Statistical Database base ManagementManagement ((SSDBM'04SSDBM'04), 21), 21--23 June 2004, 23 June 2004, SantoriniSantorini Island, Greece. Island, Greece.
•• A System for Semantic Integration of Geologic Maps via A System for Semantic Integration of Geologic Maps via OntologiesOntologies, K. Lin and B. , K. Lin and B. LudLudääscherscher. In . In Semantic Web Technologies for Searching and Retrieving ScientifiSemantic Web Technologies for Searching and Retrieving Scientific Datac Data((SCISWSCISW), Sanibel Island, Florida, 2003. ), Sanibel Island, Florida, 2003.
•• Towards a Generic Framework for Semantic Registration of ScientiTowards a Generic Framework for Semantic Registration of Scientific Datafic Data, S. Bowers and , S. Bowers and B. B. LudLudääscherscher. In . In Semantic Web Technologies for Searching and Retrieving ScientifiSemantic Web Technologies for Searching and Retrieving Scientific Datac Data((SCISWSCISW), Sanibel Island, Florida, 2003. ), Sanibel Island, Florida, 2003.
•• The Role of XML in Mediated Data Integration Systems with ExamplThe Role of XML in Mediated Data Integration Systems with Examples from Geological es from Geological (Map) Data Interoperability(Map) Data Interoperability, B. , B. BrodaricBrodaric, B. , B. LudLudääscherscher, and K. Lin. In , and K. Lin. In Geological Society of Geological Society of America (GSA) Annual MeetingAmerica (GSA) Annual Meeting, volume 35(6), November 2003. , volume 35(6), November 2003.
•• Semantic Mediation Services in Geologic Data Integration: A CaseSemantic Mediation Services in Geologic Data Integration: A Case Study from the GEON Study from the GEON GridGrid, K. Lin, B. , K. Lin, B. LudLudääscherscher, B. , B. BrodaricBrodaric, D. , D. SeberSeber, C. , C. BaruBaru, and K. A. , and K. A. SinhaSinha. In . In Geological Geological Society of America (GSA) Annual MeetingSociety of America (GSA) Annual Meeting, volume 35(6), November 2003. , volume 35(6), November 2003.
•• Query Planning and RewritingQuery Planning and Rewriting•• Processing FirstProcessing First--Order Queries under Limited Access PatternsOrder Queries under Limited Access Patterns, Alan Nash and B. , Alan Nash and B.
LudLudääscherscher, , Proc. 23rd ACM Symposium on Principles of Database SystemsProc. 23rd ACM Symposium on Principles of Database Systems ((PODS'04PODS'04) Paris, ) Paris, France, June 2004. France, June 2004.
•• Processing Unions of Conjunctive Queries with Negation under LimProcessing Unions of Conjunctive Queries with Negation under Limited Access Patternsited Access Patterns, Alan , Alan Nash and B. Nash and B. LudLudääscherscher., ., 9th Intl. Conference on Extending Database Technology9th Intl. Conference on Extending Database Technology ((EDBT'04EDBT'04) ) HeraklionHeraklion, Crete, Greece, March 2004, LNCS 2992. , Crete, Greece, March 2004, LNCS 2992.
•• Web Service Composition Through Declarative Queries: The Case ofWeb Service Composition Through Declarative Queries: The Case of Conjunctive Queries Conjunctive Queries with Union and Negationwith Union and Negation, B. , B. LudLudääscherscher and Alan Nash. Research abstract (poster), and Alan Nash. Research abstract (poster), 20th 20th Intl. Conference on Data EngineeringIntl. Conference on Data Engineering ((ICDE'04ICDE'04) Boston, IEEE Computer Society, April 2004.) Boston, IEEE Computer Society, April 2004.
76 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004 Scientific Data & WF Engineering, B.LudäscherNov. 15th 2004
Related PublicationsRelated Publications•• Scientific WorkflowsScientific Workflows•• KeplerKepler: An Extensible System for Design and Execution of Scientific Wo: An Extensible System for Design and Execution of Scientific Workflowsrkflows, I. , I. AltintasAltintas, C. , C.
Berkley, E. Jaeger, M. Jones, B. Berkley, E. Jaeger, M. Jones, B. LudLudääscherscher, S. Mock, , S. Mock, 16th International Conference on 16th International Conference on Scientific and Statistical Database ManagementScientific and Statistical Database Management ((SSDBM'04SSDBM'04), 21), 21--23 June 2004, 23 June 2004, SantoriniSantorini Island, Island, Greece. Greece.
•• KeplerKepler: Towards a Grid: Towards a Grid--Enabled System for Scientific WorkflowsEnabled System for Scientific Workflows, , IlkayIlkay AltintasAltintas, Chad Berkley, , Chad Berkley, EfratEfrat Jaeger, Matthew Jones, Bertram Jaeger, Matthew Jones, Bertram LudLudääscherscher, Steve Mock, , Steve Mock, Workflow in Grid Systems Workflow in Grid Systems (GGF10)(GGF10), Berlin, March 9th, 2004., Berlin, March 9th, 2004.
•• An OntologyAn Ontology--Driven Framework for Data Transformation in Scientific WorkflowsDriven Framework for Data Transformation in Scientific Workflows, S. Bowers and , S. Bowers and B. B. LudLudääscherscher, , Intl. Workshop on Data Integration in the Life SciencesIntl. Workshop on Data Integration in the Life Sciences ((DILS'04DILS'04), March 25), March 25--26, 26, 2004 Leipzig, Germany, LNCS 2994. 2004 Leipzig, Germany, LNCS 2994.
•• A Web Service Composition and Deployment Framework for ScientifiA Web Service Composition and Deployment Framework for Scientific Workflows, I. c Workflows, I. AltintasAltintas, , E. Jaeger, K. Lin, B. E. Jaeger, K. Lin, B. LudaescherLudaescher, A. , A. MemonMemon, In the , In the 2nd Intl. Conference on Web Services2nd Intl. Conference on Web Services((ICWSICWS), San Diego, California, July 2004.), San Diego, California, July 2004.