1USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Artificial Intelligence and Large-Scope Science:
Workflow Planning and Beyond
Yolanda Gil
USC/Information Sciences Institute
www.isi.edu/~gil
In collaboration with others in the Intelligent Systems Division and the Center for Grid Technologies at USC/ISI including:
Ewa Deelman, Carl Kesselman, Jim Blythe
Supported in part by NSF’s GriPhyn and SCEC/CME projects, and by internal grants from USC/ISI
INFORMATIONSCIENCESINSTITUTE
2USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Outline
Motivation• Large-scope large-scale science• Challenges and opportunities for Artificial Intelligence
Research on workflow planning at USC/ISI• Using AI techniques in Pegasus to generate executable grid
workflows Future directions in support of scientific workflows
• Intelligent interactive assistance and automatic completion
• Active workflows• Cognitive grids
Knowledge infrastructure for science• Challenges in Community-Based Knowledge Capture and
Representation
3USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
The Southern California Earthquake Center’s Community Modeling Environment (SCEC-CME) (http://iowa.usc.edu/cmeportal/)
4USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Integrating Diverse Models of Complex Phenomena…
Fault models Fault ruptures
Wave propagation
Historic records
Site response models
Effect on structures
5USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
…for Broader Use
Geophysicists, civil and structural engineers, city planners, emergency managers, …
• Analyze seismic hazard• Learn and understand seismic hazard
Of course, scientists need this infrastructure as well!
6USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Not Just Large-Scale and HPC Issues:Large-Scope Science and Engineering Research
“Whereas large-scale means increasing the resolution of the solution to a fixed physical model problem, large-scope means increasing the physical complexity of the model itself. Increasing the scope involves adding more physical realism to the simulation, making the actual code more complex and heterogeneous, while keeping the resolution more or less constant.”
-- Report from ACM Workshop on Strategic Directions in Computing
Research, A. Sameh et al on Computational Science and Engineering, June 1996
7USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
How This is Done Today
Scientists:• Verbal communication needed to compose
models• When an earthquake occurs, hard to respond
quickly Other users (e.g., building engineers):
• Use models based on correlations of historical data
• Employ consultants that know how to setup these models
• Delay in accessing state-of-the-art scientific models
8USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Scientific Workflows
Models composed into end-to-end scientific workflows that model/analyze complex physical phenomena• In-silico experimentation• Data collection and analysis
Reproducibility, reusability, pedigree
Hazard CurveCalculator: SA vs. prob. exc.
SA exc. probs.
SA exc. prob.
Rupture
Ruptures
Site VS30
Site Basin-Depth-2.5
SA Period
Gaussian Truncation
Std. Dev. Type
Task Result: Hazard curve: SA vs. prob. exc.
Hazard curve: SA vs. prob. exc.
Field (2000)
IMR: SAexc. prob.
Basin-DepthCalculator
Basin-DepthLatLong.
UTM Converter
(get-Lat-Long-given-UTM)
Lat.longUTM
(, , , )
LatLong.CVM-get-
Velocity-at-point
VelocityLatLong.
Ruptures
PEER-FaultGaussian DistNo TruncationTotal Moment
Rate
Duration-Year
Fault-Grid-SpacingRupture Offset
Mag-Length-sigmaDipRake
Magnitude (min)
Magnitude (max)Magnitude (mean)
rfml
rfml
9USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Executing Scientific Workflows on Grids Grids support this process through middleware services:
• Seamless integration and management of resources (OGSA)• Job submission (Condor)• Resource Monitoring and Directory Service (MDS) • Replica Location Service (RLS)• Metadata Catalog Services (MCS)
RDiscovery
Many sourcesof data, services,computation
R
Registries organizeservices of interestto a community
Access
Data integration activitiesmay require access to, &exploration/analysis of, dataat many locations
Exploration & analysismay involve complex,multi-step workflows
RM
RM
RMRM
RM
Resource managementis needed to ensureprogress & arbitrate competing demands
Securityservice
Securityservice
PolicyservicePolicyservice
Security & policymust underlie access& managementdecisions
From [Kesselman 04]:
10USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
FFT
FFT filea
/usr/local/bin/fft /home/file1
transfer filea from host1://
home/fileato host2://home/file1
ApplicationDomain
AbstractWorkflow
ConcreteWorkflow
ExecutionEnvironment
host1 host2
Data
Data
host2
Application Development and Execution Process
DataTransfer
Resource SelectionData Replica Selection
Transformation InstanceSelection
ApplicationComponent
Selection
Retry
Pick different Resources
Specify aDifferentWorkflow
Failure RecoveryMethod
11USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Challenges
Complexity: Many choices are involved as workflow is composed• Alternative application components, files, and locations• Many different interdependencies may occur among components• May reach many dead ends
Usability: Users should not need to be aware of infrastructure details
• Files are distributed, indexed, replicated • Match application requirements to host capabilities
Solution cost: Evaluate the alternative solution costs• Performance• Reliability• Resource Usage
Global cost: minimizing cost across organizations • Individual user’s choices in light of other user’s choices
Reliability of execution: job resubmission upon failure• Detection, diagnosis, repair• Anticipation and avoidance, resource reservations
12USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Challenges and opportunities for Artificial Intelligence
We need alternative foundations that offer • expressive representations to capture the complex
knowledge involved in both the application domain and the execution environment
• flexible reasoners to explore this complex space systematically and incorporate constraints, tradeoffs, policies
Many Artificial Intelligence (AI) techniques are relevant:– Planning to achieve given requirements – Searching through problem spaces of related choices– Using and combining heuristics– Reasoners that can incorporate rules, definitions, axioms, etc. – Schedulers and resource allocation techniques– Coordination and communication in distributed problem solving– Expressive knowledge representation languages– Reasoning under uncertainty – Dynamic replanning and reactive control – Learning in complex dynamic environments – Learning to improve problem solving skills
13USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Outline
Motivation• Scientific workflows• Challenges and opportunities for Artificial Intelligence
Research on workflow planning at USC/ISI• Using AI techniques in Pegasus to generate executable grid
workflows Future directions in support of scientific workflows
• Intelligent interactive assistance and automatic completion
• Active workflows• Cognitive grids
Knowledge infrastructure for science• Challenges in Community-Based Knowledge Capture and
Representation
14USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
GridGridGrid
workflow executor (DAGman)
Execution
WorkflowPlanning
Globus Replica Location Service
Globus Monitoring and Discovery
Service
Information and Models
detector
Raw data
Concrete
Workflow
tasks
Replica LocationAvailable Reources
Moni
torin
g in
form
atio
n
Abstract Worfklow
Dynamic
information
Request Manager
Replica and Resource SelectorSubmission and
Monitoring System
Workflow Reduction
DataPublication
Virtual Data Language Chimera
Data Management
TransformationCatalog
WorkflowGeneration
Reasoning about Distributed Execution Infrastructure in Grids with Pegasus (work with J. Blythe, E. Deelman, C. Kesselman, and others)
[Gil et al, IEEE IS 04]
15USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Pegasus: Using AI Planning Techniques to Generate Executable Grid Workflows
Given: desired result and constraints• A desired result (high-level description of data product)• A set of application components described in the grid• A set of resources in the grid (dynamic, distributed) • A set of constraints and preferences on solution quality
Find: an executable job workflow• A configuration of components that generates the desired result• A specification of resources where components can be executed
and data can be stored• A specification of data sources and data movements
Approach: Use AI planning techniques to search the solution space and evaluate tradeoffs
• Exploit heuristics to direct the search for solutions and represent optimality and policy criteria
16USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Advantages of Using AI Planning
Provide broad-base, generic foundation Use general techniques to search for solutions Explores alternatives, supports backtracking Incorporates domain-specific and domain-
independent heuristics (as search control rules) Allow easy addition of new constraints and rules Incorporate optimality and policy into the search
for solutions Interleave decisions at various levels Can integrate the generation of workflows across
users and policies within virtual orgs.
17USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Reasoning about Workflows in Pegasus
a
d e
g
h
c
f
i
b
Data processing tasks
KEYThe original node
Input transfer node
Registration node
Output transfer node
Unnecessary nodes
e
g
h
d
a
c
f
i
b
Final Workflow
a
Desired Results
h
f
i
18USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Pegasus Application Domains (work with E. Deelman and dozens of scientists)
Pulsar search for gravitational-wave physics (LIGO)
• 975 tasks, 1365 data transfers, 975 output files, 96hrs runtime
Galaxy morphology for NVO and NASA in Montage
Thomography for neural structure reconstruction
High-energy physics – Compact Muon Solenoid
• 7 days, 678 jobs, produced ~200GB
Gene alignment• In 24 hours, ~ 10,000 Grid
jobs, >200,000 BLAST executions, produced 50 GB
19USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Small Montage Workflow
~1200 nodes [Deelman et al, 04]
20USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Artemis: Integrating Distributed Info Sources on the Grid (work with E. Deelman, S. Thakkar, R. Tuchinda)
DataSource
Models
DataSource
Filters
Entity selection
Ontology
User
Query Wizard
DataSource
MetadataCatalogServices
Model mappings
Dynamic Model
Generator
PrometheusQuery
Mediator
Theseusquery
execution
[Tuchinda et al, IAAI-04]
MetadataCatalogServices
MetadataCatalogServices
…
21USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Outline
Motivation• Scientific workflows• Challenges and opportunities for Artificial Intelligence
Research on workflow planning at USC/ISI• Using AI techniques in Pegasus to generate executable grid
workflows Future directions in support of scientific workflows
• Intelligent interactive assistance and automatic completion
• Active workflows• Cognitive grids
Knowledge infrastructure for science• Challenges in Community-Based Knowledge Capture and
Representation
22USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Scientific Workflows: Future Directions
Using AI to support the workflow creation process• Interactive assistance and automatic completion
Using AI to support the scientific experimentation process• Active workflows
Using AI to augment the execution infrastructure • Cognitive grids
23USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
The Process of Creating an Executable Workflow
1. Creating a valid workflow template (human guided)• Selecting application components and connecting inputs and
outputs• Adding other steps for data conversions/transformations
2. Creating instantiated workflow• Providing input data to pathway inputs (logical assignments)
3. Creating executable workflow (automatically)• Given requirements of each model, find and assign adequate
resources for each model• Select physical locations for logical names• Include data movement steps, including data deposition
steps
User guided
Automated
24USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Challenges for Interactive Composition of Valid Workflow Templates
Provide flexible interaction• User can start from initial data, from data products, or steps • User can specify abstract descriptions of steps and later specialize
them • User can reuse, merge, or build from scratch
Automatic tracking of workflow constraints • User is notified if there are problems but does not have to keep track
of details Proactive assistance
• System should not just point out problems but help user by suggesting fixes (always)
And… how do we define what “valid” means?
25USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Assisting Users in Creating Workflow Templates (with J. Kim and M. Spraragen)
User interaction results in modifications to workflows• Specify desired result, external/user provided input • Add/remove step, add/remove link• Specialize step (e.g., IMR -> IMR-SA)
As user creates a workflow, intermediate stages result in possibly incorrect workflows
ErrorScan algorithm detects errors and generates possible fixes• Knowledge base that represents components and constraints• Formal definitions of desirable properties of workflows based on AI
planning techniques Fixes are multi-step and “click-through” Errors and fixes are ranked using heuristics
If no errors detected, workflow is guaranteed to be correct
[Kim et al, IUI-04] [Spraragen et al, 04]
26USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Scientific Workflows: Future Directions
Using AI to support the workflow creation process• Interactive assistance and automatic completion
Using AI to support the scientific experimentation process• Active workflows
Using AI to augment the execution infrastructure • Cognitive grids
27USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Supporting the Interactive and Incremental Nature of Scientific Exploration (with M. Ellisman, E. Deelman, C. Kesselman)
Workflows cannot always be created in advance • Experimental design depends on initial / partial results • Scientific experimentation is often exploratory
Need to support interactive and incremental creation and execution of workflows
Active workflows: represent evolving workflows and are continually authored, refined, executed, and modified
28USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Supporting the Evolution of Active Workflows (I)
29USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Supporting the Evolution of Active Workflows (II)
30USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Supporting the Evolution of Active Workflows (and III)
31USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Scientific Workflows: Future Directions
Using AI to support the workflow creation process• Interactive assistance and automatic completion
Using AI to support the scientific experimentation process• Active workflows
Using AI to augment the execution infrastructure • Cognitive grids
32USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Simulationcodes
Community Distributed Resources (e.g., computers, storage, network,
simulation codes, data)
ResourceIndexes
ReplicaLocators
OtherGridservices
ApplicationKB
ResourceKB
PolicyKB
Other KB
Policy InformationServices
Pervasive Knowledge Sources
PolicyManagement
ResourceMatching
WorkflowRepair
WorkflowRefinement
Workflow historyWorkflow
historyWorkflow History
Smart Workflow Pool
Workflow Manager
High-level specification ofdesired results, constraints, requirements, user policies
Intelligent Reasoners
Pervasive Knowledge Sources and Reasoners(work with J. Blythe, E. Deelman, C. Kesselman, H. Tangmurarunkit)
[Gil et al, IEEE IS 04]
33USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Cognitive Grids: Pervasive Semantic Representations of the Environment at all Levels
Basic Grid Middleware (Globus Toolkit, Condor-G, DAGMan)
Higher-Level Service (Virtual Data Tools, Resource Brokers)
Intelligent Reasoners (matchmaking, refinement, repair, coordination, negotiation…)
Users and Applications
Semantic ResourceDescriptions
Resource Knowledge-bases
Application ComponentModels
Resource PolicyDescriptions
User and VO policymodels
Grid Resources (Compute, Data, Network)
Policy Knowledge-bases
Current Request Status, Results,Provenance Information
High-levelRequest
descriptions
Refined Workflow Provenance andMonitoring
TasksMonitoring, Resources
knowledge
Semantics forFile-based data
34USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
time
Levels ofabstraction
Application-level
knowledge
Logicaltasks
Tasksbound toresources
and sent forexecution
User’sRequest
Relevantcomponents
Fullabstractworkflow
Partialexecution
Not yetexecuted
executed
Workflow refinement
Onto-basedMatchmaker
Workflow repair
Policyreasoner
Cognitive Grids: Distributed Intelligent Reasoners that Incrementally Generate the Workflow
35USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Many Opportunities for AI Techniques
The Grid Now The Future Grid
Syntax-based matchmaking of resources to job requirements
• Condor matchmaker• Attribute based discovery and
selection Scheduling of jobs based on
Grid-able users that specify job execution sequences and computing requirements
• Scripting languages• Workflow languages,• Task graphs
Explicit mappings from task to jobs, simple job brokers
Explicit service negotiation and recovery strategies
Knowledge-based reasoning about resources enables
• Semantic matchmaking• Aggregate resource reasoning
Task-level reasoning to plan and schedule jobs and resources
• More agility and coordination Wide range of users can specify
high level requirements in a mixed-initiative mode
• Mapping of high-level requirements to details required for execution
End-to-end resource negotiation and adaptive strategies to accommodate failure
36USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Outline
Motivation• Scientific workflows• Challenges and opportunities for Artificial Intelligence
Research on workflow planning at USC/ISI• Using AI techniques in Pegasus to generate executable grid
workflows Future research in support of scientific workflows
• Intelligent interactive assistance and automatic completion
• Active workflows• Cognitive grids
Knowledge infrastructure for science• Challenges in Community-Based Knowledge Capture and
Representation
37USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Knowledge Infrastructure for Science: Challenges in Community-Based Knowledge Capture & Representation
1. be a community-wide effort 2. have community-wide acceptance3. be used in practice on a daily basis to compose
simulation code and annotate their results
38USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Scientists Ask Lots of Questions, Knowledge Representation has few Answers
How do you get started? How to ensure the community will accept it (use
it)? How do you (can you?) represent alternative
views? What is the process to contribute to it? What is the process to make changes to it? What is the impact to my application when there
is an update? How is it implemented? How is it managed? Who does what, when, where, why?
39USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
SCEC/GO Workshop on Ontology Development: Lessons Learned and Prospects [Bada et al, forthcoming]
SCEC learns from the Gene Ontology (GO) experience (Workshop Nov’02, Cambridge UK):• Had a successful jumpstart• Done by biologists, not knowledge engineers• Developed by a wide, distributed community• Focused on specific aspects of genomics
– Fly-base, yeast, mouse• Used 24/7 from day 1• Accepted widely by the community• Extended based on use requirements of a wide
community• Quite large (13K terms)• Simple (and messy) representation• Simple infrastructure• Process to accommodate changes, curation
40USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Some Policies for Organizing Contributions
Curated by knowledge engineers: processes changes requested by users• http://www.ecocyc.org
Curated by domain experts: group of domain curators processes changes requested by users• http://www.geneontology.org
Open contributions: any user can add content• http://www.dmoz.org, http://www.openmind.org
Open editing: any user can edit and create any page on a web site.• http://wiki.org
41USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Broad Range of Contributors of Scientific Knowledge (with T. Chklovski)
<<< >> <>>>>>
More inexpensiveMore inaccurateMore ambiguousDeeper into society/impact
<subclassOf foton … <>>>>
More expensiveMore accurateMore concreteDeeper into the science
42USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Thank you!
Scientific workflows• pegasus.isi.edu
Cognitive grids• www.isi.edu/ikcap/cognitive-grids
AI and science• IEEE Intelligent Systems Jan/Feb 2004, De Roure, Gil,
Hendler (Eds), Special issue on e-Science
www.isi.edu/~gil
43USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
“As We May Think”
“Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them […]. The lawyer has at his touch the associated opinions and decisions of his whole experience, and of the experience of friends and authorities. The patent attorney has on call the millions of issued patents, with familiar trails to every point of his client's interest. […] The chemist, struggling with the synthesis of an organic compound, has all the chemical literature before him in his laboratory, with trails following the analogies of compounds, and side trails to their physical and chemical behavior. […]
There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record. The inheritance from the master becomes, not only his additions to the world's record, but for his disciples the entire scaffolding by which [their additions] were erected.”
--- Vannevar Bush, 1945
http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm
44USC INFORMATION SCIENCES INSTITUTE Yolanda Gil
Searching for Pulsars with the Pegasus Planner
Used AI planning techniques to compose executable grid workflows with hundreds of jobs
Laser-Interferometer Gravitational Wave Observatory (LIGO) data, which aims to detect waves predicted by Einstein’s theory of relativity
Used LIGO’s data collected during the first scientific run of the instruments in Fall 2002
Targeted a set of 1000 locations of known pulsars as well as random locations in the sky
Performed using compute and storage resources at Caltech, University of Southern California, and University of Wisconsin Milwaukee.
Top Related