University of Illinois at Urbana-Champaign National Center for Supercomputing Applications An...

20
University of Illinois at Urbana-Champaign National Center for Supercomputing Applications An Integrated Environmental Observatory Cyberenvironment Barbara Minsker Director, Environmental Engineering, Science, & Hydrology Group, National Center for Supercomputing Applications; Principal Investigator and co-Director, CLEANER Project Office; Associate Professor, Dept of Civil & Environ. Engineering; University of Illinois, Urbana, IL, USA November 16, 2006

Transcript of University of Illinois at Urbana-Champaign National Center for Supercomputing Applications An...

University of Illinois at Urbana-Champaign National Center for Supercomputing Applications

An Integrated Environmental Observatory

Cyberenvironment

Barbara MinskerDirector, Environmental Engineering, Science, & Hydrology Group,

National Center for Supercomputing Applications;Principal Investigator and co-Director, CLEANER Project Office;

Associate Professor, Dept of Civil & Environ. Engineering;University of Illinois, Urbana, IL, USA

November 16, 2006

National Center for Supercomputing Applications

Environmental Cyberinfrastructure Demonstration (ECID) Project

• NSF Office of Cyberinfrastructure is funding NCSA and SDSC to:– Work with leading edge communities to develop

cyberinfrastructure to support science and engineering

– Incorporate successful prototypes into a persistent cyberinfrastructure

– NCSA’s primary focus: Cyberenvironments

• As part of this effort, the ECID project, led by Jim Myers & Barbara Minsker, is working with the WATERS community and CUAHSI Hydrologic Information System (HIS) project to create a prototype cyberenvironment for environmental observatories– Driven by requirements gathering and close community

collaborations

National Center for Supercomputing Applications

Requirements Gathering

• Interviews at conferences and meetings (Tom Finholt and staff, U. of Michigan)

• Usability studies (CET, Wentling group)

• Community survey (Finholt group)– AEESP and CUAHSI surveyed in 2006 as proxies for

environmental engineering and hydrology communities

– 313 responses out of 600 surveys mailed (52.2% response rate)

– Key findings are driving ECID cyberenvironment development

National Center for Supercomputing Applications

2

4

5

5

6

9

9

14

18

21

7

0 20 40 60 80 100

Non-standard spatial scales

Irregular or different tim e s teps

Consistency of m etadata

Unknow n or inconsis tent units

Investigator w ho collected the data is unknow n to m e

Other

Exis tence of m etadata

Not applicable

Processing the data from raw form into variables thatcan be used by other tools

Non-standard data form ats

Learning how to quality control the data

Percent

What is the single most important obstacle to using data from different sources?

• 55% concerned about insufficient credit for shared data • N=278

Nonstandard/ inconsistent units/formats

Metadata problems

Other obstacles

Shows a need for an integrated cyberenvironment with provenance.

National Center for Supercomputing Applications

What three software packages do you use most frequently in your work?

1

7

7

7

13

19

42

2

2

4

6

20

29

24

46

8

0 20 40 60 80 100

SQL/Server

SPSS

MS Access

SAS

MATLAB

ArcGIS

Other*

Excel

Percent

AEESP

CUAHSI

*Other:• MS Word• MS PowerPoint• Statistics

applications (e.g., Stata, R, S-Plus)

• SigmaPlot• PHREEQC• MathCAD• FORTRAN compiler• Mathematica• GRASS GIS• Groundwater

models• Modflow

Majority are not using high-end computational tools.

National Center for Supercomputing Applications

Factors influencing technology adoption

1

3

4

2

6

6

9

9

16

18

17

22

1

2

2

7

4

5

6

10

13

16

17

27

27

6

0 20 40 60 80 100

Necessity of creating an account

Having to install softw are on m y personal com puter(rather than accessing everything through a Web

Security of m y personal inform ation

Ability to access and m odify source code (e .g., form odels or w orkflow s)

Upgrades for long-term use

Speed of loading the CyberCollaboratory pages on m ycom puter (w ith the Internet connection speed that I

Necessity of learning new tools

Com patibility w ith exis ting tools that I use

Stability of softw are for long-term use

Fulfillm ent of m y current research needs

Profess ional technical support

Ability to do things I cannot do w ith currentsoftw are /hardw are

Clarity of interface /ease of use

Percent

AEESP

CUAHSI

Ease of use, good support, and new capabilities are essential.

National Center for Supercomputing Applications

What are the three most compelling factors that would lead you to collaborate with another person in your field?

2

1

3

3

3

4

9

13

11

12

16

25

25

1

4

2

3

3

4

4

6

9

6

16

15

23

25

29

3

4

0 20 40 60 80 100

Shared methods

Other

Career advancement

Preference for working with others

Proximity of the person

Access to models

Leveraging funding by combining budgets

Shared values

Access to data

Access to equipment (e.g., sensors, computers)

Trusting the person

Opportunity to brainstorm ideas with others

Shared interests

Access to another’s expertise

Complementary areas of expertise

Percent

AEESP

CUAHSI

Community seeks collaborations to gain different expertise.

National Center for Supercomputing Applications

Environmental CI Architecture: Research Services

Create Hypo-thesis

Obtain Data

Analyze Data &/or Assimilate into Model(s)

Link &/or Run Analyses &/or Model(s)

Discuss Results

Publish

Knowledge Services

Data Services

Workflows & Model Services

Meta-Workflows

Collaboration Services

Digital Library

Research Process

Supporting TechnologyIntegrated CIECID Project Focus: Cyberenvironments

HIS Project Focus

National Center for Supercomputing Applications

Cyberenvironments

• Couple traditional desktop computing environments coupled with the resources and capabilities of a national cyberinfrastructure

• Provide unprecedented ability to access, integrate, automate, and manage complex, collaborative projects across disciplinary and geographical boundaries.

• ECID is demonstrating how cyberenvironments can:– Support observatory sensor and event management, workflow and

scientific analyses, and knowledge networking, including provenance information to track data from creation to publication.

– Provide collaborative environments where scientists, educators, and practitioners can acquire, share, and discuss data and information.

National Center for Supercomputing Applications

ECID CyberEnvironment Components

CyberCollaboratory:Collaborative Portal CyberIntegrator:

Exploratory Workflow Integration

CI:KNOW: Network Browser/Recommender

TupeloMetadata Services Community Event

Management/Processing

SSO

Single Sign-On Security (coming)

CUAHSI HIS Data Services

National Center for Supercomputing Applications

CyberCollaboratory

• The CyberCollaboratory is a web portal to allow sharing of information and ideas across the community.

• Currently being used by CLEANER Project Office

To check out the public view of CyberCollaboratory, create an account at

http://cleaner.ncsa.uiuc.edu/cybercollab

National Center for Supercomputing Applications

CyberIntegrator

• Studying complex environmental systems requires:– Coupling analyses and models – Real-time, automated updating of analyses and modeling with diverse

tools

• CyberIntegrator is a prototype technology to support exploratory modeling and analysis of complex systems. Integrates the following tools to date:– Excel– IM2Learn image processing and mining tools, including ArcGIS image

loading– D2K data mining– Java codes, including event management tools

• Additional tools will be added, based on high priority needs of beta users. Some options: ArcGIS, OpenMI model integration, Fortran codes, Matlab, Kepler, generic tools for running executables and Web services

National Center for Supercomputing Applications

CyberIntegrator Architecture

Example of CyberIntegrator Use:Carrie Gibson created a fecal coliform prediction model in ArcGIS using Model Builder that predicts annual average concentrations.

Ernest To rewrote the model as a macro in Excel to perform Monte Carlo simulation to predict median and 90th percentile values.

CyberIntegrator’s goal: Reduce manual labor in linking these tools, visualizing theresults, and updating in real time.

National Center for Supercomputing Applications

Real-Time Simulation of Copano Bay TMDL with CyberIntegrator

CyberIntegrator

Streamflows to Distributions

(Excel)

USGS DailyStreamflows

(web services)

Fecal ColiformConcentrations

Model(Excel)

Load Shapefiles(Im2Learn)

ShapefilesFor Copano

Bay

call

data

Geo-reference and Visualize Results

(Im2Learn)

12 3 4

Excel Executor Im2Learn Executor

National Center for Supercomputing Applications

Sensor Anomaly Detection Scenario

CC Bay Sensor Monitor Page

CyberIntegrator

Dashboard

Sensor data

Anomalies

Listens for data events & creates event when anomaly discovered.

Anomaly Detector 1

Anomaly Detector 2

Anomalies

Sen

sor

Dat

a

Sh

ares

w

ork

flo

w t

o

serv

er

Event Manager

CCBay Sensor Map

User subscribes to anomaly detector workflows

CI-KNOW Network

CyberIntegrator loads recommended workflow. User adjusts parameters to CCBay Sensor.

Sensor map shows nearby related sensors so user can check data. Anomaly detector is faulty. CI-KNOW recommends alternate anomaly detector from Chesapeake Bay observatory.

Alerts user to anomaly detection, along with other events (logged-in users, new documents, etc.)

University of Illinois at Urbana-Champaign National Center for Supercomputing Applications

Demonstrations…

National Center for Supercomputing Applications

Cyberenvironment Technologies

Workflow Publication/Retrieval

Web Services

Raw Data

JMSJMS Broker

(ActiveMQ 4.0.1)

Anomaly Subscription

JMS

Data and Anomaly Subscriptions

JMS

CyberDashboardDesktop Application

CyberCollaboratory

CI-KNOW

Recommender Network Web Service

SOAP

Workflow Reference

URL

CyberIntegrator

Data Subscriptions

JMS

Anomaly Publication

JMS

Workflow Service

CyberIntegrator Workflow

CyberIntegrator Workflow

SOAP

Semantic ContentProvenanceEvent Topics Workflow Templates User Subscriptions

Tupelo

ECID Managed Data/Metadata

Sensor Page Reference

URL

Metadata

AnomaliesData

RDBMS

National Center for Supercomputing Applications

ECID & WATERS Testbeds

• Two technologies in the Cyberenvironment are ready for beta testing: CyberCollaboratory and CyberIntegrator

• We invite WATERS testbed projects to become beta testers:– Use the beta software starting January 1st. We will work with you

to create CyberIntegrator executors for your tools (do it yourself with our open source code or we’ll do it)

– Provide feedback to our developers on usability and functionality– Ongoing software modifications will be made in response to

feedback

• To date, 4 projects agreed to participate in beta testing– 3 WATERS testbeds: Corpus Christi Bay, Chesapeake Bay, Utah– UNESCO IHE researchers

• Interested in joining the beta testing?– See Luigi Marini for more details, or e-mail him at

[email protected]

National Center for Supercomputing Applications

Conclusions

• The ECID Cyberenvironment demonstrates the benefits of end-to-end integration of cyberinfrastructure and desktop tools, including:– HIS-type data services– Workflow– Event management– Provenance and knowledge management, and– Collaboration for supporting environmental researchers, educators,

and outreach partners (e.g., policy makers)

• This creates a powerful system for linking observatory operations with flexible, investigator-driven research in a community framework (i.e., the national network).– Workflow and knowledge management support testing hypotheses

across observatories– Provenance supports QA/QC and rewards for community

contributions in an automated fashion.

National Center for Supercomputing Applications

Acknowledgments

• Contributors:– NCSA ECID team (Peter Bajcsy, Noshir Contractor, Steve

Downey, Joe Futrelle, Hank Green, Rob Kooper, Yong Liu, Luigi Marini, Jim Myers, Mary Pietrowicz, Tim Wentling, York Yao, Inna Zharnitsky)

– Corpus Christi Bay Testbed team (PIs: Jim Bonner, Ben Hodges, David Maidment, Paul Montagna)

• Funding sources:– NSF grants BES-0414259, BES-0533513, and SCI-

0525308

– Office of Naval Research grant N00014-04-1-0437