Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific...
Transcript of Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific...
![Page 1: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/1.jpg)
Introduction to the Kepler Workflow System
Matthew B. JonesNational Center for Ecological Analysis and Synthesis (NCEAS)
University of California, Santa Barbara
Software Tools for Sensor NetworksA Workshop sponsored by NCEAS, LTER, and DataONE
May 1-5, 2012
![Page 2: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/2.jpg)
Abstract
• Scientific workflows capture the transformation of data that are produced and consumed by disparate analysis and modeling software systems. Kepler is an open-source system for authoring and executing workflows, providing access to data and services from a variety of networks and systems. By versioning data, workflows, and executions, Kepler allows full reconstruction of the analyses used in scientific papers, even if those analyses are conducted using a variety of commercial and custom software. Kepler promotes reproducible science by allowing users to publish these workflows, data products, and execution traces to remote repositories to be shared with other users.
![Page 3: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/3.jpg)
Diverse Analysis and Modeling
• Wide variety of analyses used in ecology and environmental sciences– Statistical analyses and trends– Rule-based models– Dynamic models (e.g., continuous time)– Individual-based models (agent-based)– many others
• Implemented in many frameworks– implementations are black-boxes– learning curves can be steep– difficult to couple models
![Page 4: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/4.jpg)
Analysis/Modeling Challenges
• Manual process to work with multiple analytical systems
• Data are discovered outside of tools and imported manually
• Difficult to understand models at a glance
• Difficult to revise analyses except in scripted systems
• No accepted way to publish models to share with colleagues
• Little re-use of components – many re-inventions
• Difficult to use multiple computers for one analysis/model– Only a few experts use grid computing
![Page 5: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/5.jpg)
Reproducible Science
• Analytical transparency– open systems– works across analysis packages– documents algorithms completely
• Automated analysis for repeatability– must be scriptable– must be able to handle data dynamically
• Archived and shared analysis and model runs
![Page 6: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/6.jpg)
• Current analytical practices are difficult to manage
• Model the steps used by researchers during analysis– Graphical model of flow of data among processing steps
• Each step often occurs in different software– Matlab, R, SAS, C/C++, Fortran, Swarm, ...– Each component can ‘wrap’ external systems, presenting
a unified view
• Refer to these graphs as ‘Scientific Workflows’
Models as ‘scientific workflows’
Data GraphClean Analyze/Model
![Page 7: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/7.jpg)
A
Source(e.g., data)
C
Sink(e.g., display)
B
Scientific workflows• What are scientific workflows?
– Graphical model of data flow among processing steps
– Inputs and Outputs of components are precisely defined– Components are modular and reusable– Flow of data controlled by a separate execution model– Support for hierarchical models
Processor(e.g., regression)
![Page 8: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/8.jpg)
A
Source(e.g., data)
C
Sink(e.g., display)
B
Scientific workflows• What are scientific workflows?
– Graphical model of data flow among processing steps
– Inputs and Outputs of components are precisely defined– Components are modular and reusable– Flow of data controlled by a separate execution model– Support for hierarchical models
A’
Processor(e.g., regression)
![Page 9: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/9.jpg)
A
Source(e.g., data)
C
Sink(e.g., display)
B
Scientific workflows• What are scientific workflows?
– Graphical model of data flow among processing steps
– Inputs and Outputs of components are precisely defined– Components are modular and reusable– Flow of data controlled by a separate execution model– Support for hierarchical models
A’
Processor(e.g., regression)
B
ED F
![Page 10: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/10.jpg)
• Overview of Kepler• Features
– Data Access– Workflow archiving and sharing– Grid Computing support
• Open source community
Outline
![Page 11: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/11.jpg)
Overview of Kepler
![Page 12: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/12.jpg)
• Goals
• Produce an open-source scientific workflow system• enable scientists to design, share, and execute
scientific workflows
• Support scientists in a variety of disciplines• e.g., biology, ecology, oceanography, astronomy
• Important features• access to scientific data• flexible framework that works across analytical packages• simplify distributed computing using computing grids• clear documentation of analysis and models• effective user interface for workflow design• provenance tracking for results• model archiving and sharing
![Page 13: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/13.jpg)
Kepler use cases represent many science domains
• Ecology– SEEK: Ecological Niche Modeling
– COMET: environmental science – REAP: Parasite invasions using sensor networks
• Geosciences– GEON: LiDAR data processing
– GEON: Geological data integration
• Molecular biology– SDM: Gene promoter identification
– ChIP-chip: genome-scale research
– CAMERA: metagenomics
• Oceanography– REAP: SST data processing– LOOKING: ocean observing CI
– NORIA: ocean observing CI
– ROADNet: real-time data modeling
– Ocean Life project
• Physics– CPES: Plasma fusion simulation
– FermiLab: particle physics
• Phylogenetics• ATOL: Processing Phylodata• CiPRES: phylogentic tools
• Chemistry• Resurgence: Computational
chemistry• DART (X-Ray crystallography)
• Library Science• DIGARCH: Digital preservation• Cheshire digital library: archival
• Conservation Biology• SanParks: Thresholds of Potential
Concerns
![Page 14: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/14.jpg)
Anatomy of a Kepler Workflow
Actors
Channels Ports
Tokens int, string, record{..}, array[..], ..
![Page 15: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/15.jpg)
Kepler scientific workflow system
![Page 16: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/16.jpg)
Kepler scientific workflow system
Data source from repository
![Page 17: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/17.jpg)
Kepler scientific workflow system
Data source from repository
res <- lm(BARO ~ T_AIR)resplot(T_AIR, BARO)abline(res)
R processing script
![Page 18: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/18.jpg)
Kepler scientific workflow system
Data source from repository
res <- lm(BARO ~ T_AIR)resplot(T_AIR, BARO)abline(res)
R processing script
![Page 19: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/19.jpg)
Kepler scientific workflow system
Run ManagementEach execution recordedProvenance of derived data recordedCan archive runs and derived data
![Page 20: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/20.jpg)
A Simple Kepler Workflow
Component Tab
Workflow Run Manager
Searchable Component
List
![Page 21: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/21.jpg)
Component Documentation
![Page 22: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/22.jpg)
![Page 23: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/23.jpg)
Data preparation
![Page 24: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/24.jpg)
Data preparation
FORTRAN code
![Page 25: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/25.jpg)
Data preparation
FORTRAN code
MATLAB code
![Page 26: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/26.jpg)
Data Access
![Page 27: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/27.jpg)
Accessing Data in Kepler
• File system (e.g., CSV files)• Catalog searches (e.g., KNB)• Remote databases (e.g., PostgresQL)• Web services• Data access protocols (e.g., OPeNDAP)• Streaming data (e.g., DataTurbine)• Specialized repositories (e.g., SRB)
• etc., and extensible
![Page 28: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/28.jpg)
Direct Data Access to Data RepositoriesSearch
for metadata term (“ADCP”)
Drag to workflow area to create datasource
398 hits for ‘ADCP’ located in search
![Page 29: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/29.jpg)
OPeNDAP
• Directly access OPeNDAP servers• Apply OPeNDAP constraints for
remote data subsetting
• Current work: searchable catalogs across OPeNDAP servers
![Page 30: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/30.jpg)
Gene sequences via web services
![Page 31: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/31.jpg)
Gene sequences via web services
Web service executes remotely (e.g., in Japan)
![Page 32: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/32.jpg)
Gene sequences via web services
Gene sequence returnedin XML format
Web service executes remotely (e.g., in Japan)
![Page 33: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/33.jpg)
Gene sequences via web services
Web service executes remotely (e.g., in Japan)
Extracted sequencecan be returned forfurther processing
![Page 34: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/34.jpg)
Gene sequences via web services
Web service executes remotely (e.g., in Japan)
This entire workflow can be wrapped as a re-usable componentso that the details of extracting sequence data are hidden unless needed.
Extracted sequencecan be returned forfurther processing
![Page 35: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/35.jpg)
Benthic Boundary Layer Project: Kilo Nalu, Hawaii
Benthic Boundary Layer Geochemistry and Physics at the Kilo Nalu ObservatoryG. Pawlak, M. McManus, F. Sansone, E. De Carlo, A. Hebert and T. Stanton
NSF Award #OCE-0536607-000
• Research instruments are part of cabled-array at the Kilo Nalu Observatory• Deployed off of Point Panic, Honolulu Harbor, Hawai’i• Goal: Measure the interactions between physical oceanographic forcing, sediment alteration, and
modification of sediment-seawater fluxes
![Page 36: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/36.jpg)
Accessing sensor streams at Kilo Nalu
!
!
!
!!
!
! !
!
! !!
!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
!! !
!
!
!
!
!
!
!!
!
!
!
!
!
! ! ! !
!
!!
!!
!! ! !
24.2
024.3
024.4
024.5
0
water temperature
(bottom, 10m ADCP)
Time
Tem
pera
ture
degre
es C
01:00 05:00 09:00 13:00 17:00
![Page 37: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/37.jpg)
Accessing sensor streams at Kilo Nalu
!
!
!
!!
!
! !
!
! !!
!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
!! !
!
!
!
!
!
!
!!
!
!
!
!
!
! ! ! !
!
!!
!!
!! ! !
24.2
024.3
024.4
024.5
0
water temperature
(bottom, 10m ADCP)
Time
Tem
pera
ture
degre
es C
01:00 05:00 09:00 13:00 17:00
Streaming Datafrom observatoryDataTurbine Server
![Page 38: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/38.jpg)
Accessing sensor streams at Kilo Nalu
!
!
!
!!
!
! !
!
! !!
!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
!! !
!
!
!
!
!
!
!!
!
!
!
!
!
! ! ! !
!
!!
!!
!! ! !
24.2
024.3
024.4
024.5
0
water temperature
(bottom, 10m ADCP)
Time
Tem
pera
ture
degre
es C
01:00 05:00 09:00 13:00 17:00
Streaming Datafrom observatoryDataTurbine Server
now <- Sys.time()Epoch <- now - as.numeric(now)timeval <-Epoch + timestampsposixtmedian = median(timeval)mediantime = as.numeric(posixtmedian)meantemp = mean(data)
Support application scriptsin R, Matlab, etc.
![Page 39: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/39.jpg)
Accessing sensor streams at Kilo Nalu
!
!
!
!!
!
! !
!
! !!
!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
!! !
!
!
!
!
!
!
!!
!
!
!
!
!
! ! ! !
!
!!
!!
!! ! !
24.2
024.3
024.4
024.5
0
water temperature
(bottom, 10m ADCP)
Time
Tem
pera
ture
degre
es C
01:00 05:00 09:00 13:00 17:00
Streaming Datafrom observatoryDataTurbine Server
Modular components,easily saved and shared
![Page 40: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/40.jpg)
Accessing sensor streams at Kilo Nalu
!
!
!
!!
!
! !
!
! !!
!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
!! !
!
!
!
!
!
!
!!
!
!
!
!
!
! ! ! !
!
!!
!!
!! ! !
24.2
024.3
024.4
024.5
0
water temperature
(bottom, 10m ADCP)
Time
Tem
pera
ture
degre
es C
01:00 05:00 09:00 13:00 17:00
Streaming Datafrom observatoryDataTurbine Server
Graphs and derived data can bearchived and displayed
Modular components,easily saved and shared
![Page 41: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/41.jpg)
Composite actors aid comprehension
![Page 42: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/42.jpg)
Composite actors aid comprehension
![Page 43: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/43.jpg)
Composite actors aid comprehension
![Page 44: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/44.jpg)
Composite actors aid comprehension
![Page 45: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/45.jpg)
Composite actors aid comprehension
![Page 46: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/46.jpg)
Composite actors aid comprehension
• Save components • for later re-use
• Share components • via external repositories
![Page 47: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/47.jpg)
Workflow archiving and sharing
![Page 48: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/48.jpg)
Archiving isn’t just for data...
• Kepler can archive and version:
–Analysis code and workflows
–Results and derived data• e.g., data tables, graphs, maps
–Derived data lineage• What data were used as inputs• What processes were used to generate the
derived products
![Page 49: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/49.jpg)
Run Management & Sharing• Provenance subsystem
monitors data tokens
![Page 50: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/50.jpg)
Run Management & Sharing• Provenance subsystem
monitors data tokens
![Page 51: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/51.jpg)
Run Management & Sharing• Provenance subsystem
monitors data tokens
![Page 52: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/52.jpg)
Scheduling remote execution
![Page 53: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/53.jpg)
Viewing remote runs
•
![Page 54: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/54.jpg)
Grid Computing
![Page 55: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/55.jpg)
• Support for several grid technologies– Ad-hoc Kepler networks (Master-Slave)– Globus grid jobs– Hadoop Map-Reduce– SSH plumbed-HPC
Grid computing
![Page 56: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/56.jpg)
Open Source Community
![Page 57: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/57.jpg)
Open Kepler Collaboration
• http://kepler-project.org
• Open-source– BSD License
• Collaborators– UCSB, UCD,
UCSD, UCB, Gonzaga, many others
Ptolemy II
![Page 58: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/58.jpg)
Community Contribution: Kepler/WEKA
from Peter Reutemann
![Page 59: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/59.jpg)
Community Contribution:Science Pipes
from Paul Allen, Cornell Lab of Ornithology
![Page 60: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/60.jpg)
In summary…
• Typical analytical models are complex and difficult to comprehend and maintain
• Scientific workflows provide– An intuitive visual model– Structure and efficiency in modeling and analysis– Abstractions to help deal with complexity– Direct access to data– Means to publish and share models
• Kepler is an evolving but effective tool for scientists– Kepler/CORE award funds transition from research prototype
to production software tool
![Page 61: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/61.jpg)
• Mix analytical systems– Matlab, R, C code, FORTRAN, other executables, ...
• Understand models– visually depict how the analysis works
• Directly access data• Utilize Grid and Cloud computing• Share and version models
– allow sharing of analytical procedures– document precise versions of data and models used
• Provide provenance information– provenance is critical to science– workflows are metadata about scientific process
Advantages of Scientific Workflows
![Page 62: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/62.jpg)
Workflows promote reproducible science
• Scientific Workflows are metadata about process
• Document data analysis and models– provide provenance for data derivation– allows sharing of analytical details
• Publishing and citing workflows supports reproducibility of scientific results
![Page 63: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/63.jpg)
NCEAS’ model for Open Science
From Reichman, Jones, and Schildhauer; doi:10.1126/science.1197962
![Page 64: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/64.jpg)
NCEAS’ model for Open Science
From Reichman, Jones, and Schildhauer; doi:10.1126/science.1197962
![Page 65: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/65.jpg)
NCEAS’ model for Open Science
From Reichman, Jones, and Schildhauer; doi:10.1126/science.1197962
![Page 66: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/66.jpg)
NCEAS’ model for Open Science
From Reichman, Jones, and Schildhauer; doi:10.1126/science.1197962
![Page 67: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/67.jpg)
NCEAS’ model for Open Science
From Reichman, Jones, and Schildhauer; doi:10.1126/science.1197962
![Page 68: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/68.jpg)
Questions?
• http://www.nceas.ucsb.edu/ecoinformatics/
• http://kepler-project.org
![Page 69: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/69.jpg)
Acknowledgments
• This material is based upon work supported by:• The National Science Foundation (9980154, 9904777,
0131178, 9905838, 0129792, and 0225676)• The National Center for Ecological Analysis and Synthesis• The Andrew W. Mellon Foundation.• Kepler contributors: SEEK, REAP, Kepler/CORE, Ptolemy
II, SDM/SciDAC projects• For many shared conversations and a shared vision for
Kepler:– Betram Ludaescher and Tim McPhillips, UC Davis– Ilkay Altintas, UC San Diego– Mark Schildhauer, UC Santa Barbara– Shawn Bowers, Gonzaga University– Christopher Brooks, UC Berkeley
![Page 70: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/70.jpg)
Extra slides
![Page 71: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/71.jpg)
Sensor Network Management
![Page 72: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/72.jpg)
Real-time Environment for Analytical Processing
• Management and Analysis of Environmental Observatory Data using the Kepler Scientific Workflow System
http://reap.ecoinformatics.org/
![Page 73: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/73.jpg)
REAP goals
• For scientists– capabilities for designing and executing complex analytical
models over near real-time and archived data sources
• For data-grid engineers• monitoring and
management capabilities of underlying sensor networks
• For outside users• access to
observatory data and results of models, approachable to non-scientists.
![Page 74: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/74.jpg)
Sensor sites: topology and monitoring
![Page 75: Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows](https://reader033.fdocuments.net/reader033/viewer/2022052423/5f03db297e708231d40b19f2/html5/thumbnails/75.jpg)
Sensor sites: topology and monitoring