Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an...

32
Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological Survey Michael P. Finn High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014

description

Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler. Michael P. Finn. High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014. Collaborators. Shaowen Wang, Anand Padmanabhan , Yan Liu - PowerPoint PPT Presentation

Transcript of Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an...

Page 1: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Geospatial Analytics for Government Agencies and the General Public:

The CyberGIS Toolkit as an Enabler

U.S. Department of the InteriorU.S. Geological Survey

Michael P. Finn

High Performance Computing and Geospatial Analytics WorkshopArgonne National Laboratory29 – 30 Apr 2014

Page 2: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Collaborators• Shaowen Wang, Anand Padmanabhan, Yan Liu

– University of Illinois at Urbana-Champaign (UIUC), CyberInfrastructure and Geospatial Information Laboratory

• David M. Mattli, Jeff Wendel, E. Lynn Usery, Michael Stramel– USGS, Center of Excellence for Geospatial Information Science (CEGIS)

• Kristina H. Yamamoto– USGS, National Geospatial Technical Operations Center

• Babak Behzad– UIUC, Department of Computer Science

• Eric Shook– Kent State University, Department of Geography

• Qingfeng (Gene) Guan– China University of Geosciences

Page 3: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Where Do We Want to Go?

• Geospatial Analytics– Spatial Modeling– Geovisualization (GeoViz/ Visual Analytics)

• For Decision Makers (agencies/ citizens)– Protect natural resources– Empower cultures– Provide for our future

Page 4: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Geospatial AnalyticsSpatial Modeling/ Geovisualization

Page 5: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Data / Software

Geospatial Methods, Technologies, and

Applications

GIScience and Cyberinfrastructure

Geospatial Toolkits

Geospatial Analytics (Spatial

Modeling / GeoViz)

So:- Where have we been?- Where are we now?- Where do we want to go?

Page 6: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Data

• Analog Digital• “Big” Data• Spatial Data (geometric structure)• Data: Open? – mostly

– Findable, Accessible, Exploitable (standard format)

• Example: USGS Data holdings– 8 Layers of the National Map– Soon: Hyperspectral cubes and LiDAR point clouds

Page 7: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Quality Level

Horizontal Point Spacing (meters)

Vertical Accuracy

(centimeters) Description

1 0.35 9.25 High accuracy and resolution lidar example: lidar data collected in the Pacific Northwest

2 0.7 9.25 Medium-high accuracy and resolution lidar

3 1-2 <18.5 Medium accuracy and resolution lidar – analogous to USGS specification v. 13 and most data collected to date

4 5 46-139Early or lower quality lidar and photogrammetric elevations produced from aerotriangulated NAIP imagery

5 5 93-185Lower accuracy and resolution, primarily from IfSAR

The National Map- Elevation: Quality Levels

http://nationalmap.gov/3DEP/neea.html

Page 8: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Big Spatial Data• Geographic data of high resolution and covering large areas

creates big spatial data• Remotely-sensed images

– One-meter resolution NAIP images for Dent County, Missouri (1,955 km²) require 800 GB of storage space (more than 4 Pb equivalent for U.S.)

– Atlanta footprint of 0.33 m resolution color images is almost 1 Tb of data– Satellite images with finer than one meter resolution– LiDAR data of level 1 (8 pts per square meter), level 2 (2 points per square

meter)

Page 9: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Big Spatial Data• USGS 3DEP – Level 2 LiDAR for all of U.S. except Alaska which

is acquiring level 5 IfSAR – Data volume for point cloud, intensity images, and bare Earth elevation

model – 7 to 9 petabytes– Processing and file creation usually doubles to triples the storage

requirements• Other geospatial data – USGS National Hydrography Dataset

based on 1:24,000 scale about 700 GB (equivalent resolution 12 m; accuracy 25 m RMSE)

• New project to extract hydrography from level 2 lidar– How big will the vector (< 1 m Resolution) dataset be that results?

Page 10: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Software

• Computer compiled/ scripting languages– Manipulate data

• Software– Commercial? Open? Modifiable code? Functional?

• Tools: SAS (SPSS)/ R/ MATLAB, etc., etc…..• GIS Software: Esri ArcGIS/ QGIS

– and image processing S/W: Imagine/ ENVI– Libraries: GDAL

• Example software: mapIMG (based on CGTP; open)

Page 11: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Geospatial Methods, Technologies, and Applications

• Analytical Cartography– Mathematical Cartography– Since roughly the 18th Century

• Quantitative Geography– Since 1960s

• GIS (and image processing S/W)– Since about the 1970s– combining data & software GIS Packages– Legacy of primarily commercial software

• Open Source Software– Since roughly 1980s

• OpenGIS?– early wide-spread but often spotty “open” GIS– Foundation for maturity, expansion, and further openness

Page 12: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Here we are/ where are we going?

• Open GIS: Technology and Applications (exploitable)• Hardware and Operating Systems evolving• Data Storage trying to keep pace with Big Data• Advanced GeoViz on cusp of exploding• HPC High-Performance Spatial Computing• Increasing Spatiotemporal fidelity• Cyberinfrastructure

Page 13: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

CyberGIS

• Cyberinfrastructure (eScience)• HPC & GIScience• A balance/ interaction between theory/ data

(Rey, 2013)• Collaborative Research• Standards (for interoperability)

Page 14: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

NSF CyberGIS Project• NSF Software Infrastructure for Sustained Innovation Award

– http://cybergis.org

• USGS/ CEGIS Participation• Cyberinfrastructure resources

– XSEDE– Blue Waters supercomputer allocation– Open Science Grid

• Integration– CyberGIS Toolkit– CyberGIS Gateway– GISolve middleware services

14

Page 15: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

CyberGIS Software Environment

From Liu et al. (2014)

Page 16: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

CyberGIS Toolkit• Software Components

• PABM – Parallel Agent-Based Modeling• pRasterBlaster – Parallel Map Reprojection• Parallel PySAL (Python Spatial Analysis Library)• Spatial Text

• An open and reliable software toolbox for high-end users• Hide compute complexity • A rigorous software building, testing, packaging, and deployment framework• Focused on computational intensity, performance, scalability, and portability

in various CI environments• Easy to configure and use

Page 17: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Scalable Raster Processing

• Need for scalable map reprojection in CyberGIS analytics– Spatial analysis and modeling

• Distance calculation on raster cells requires appropriate projection– Visualization

• Reprojection for faster visualization on Web Mercator base maps

• pRasterBlaster integration in CyberGIS Toolkit and Gateway– Software componentization: librasterblaster, pRasterBlaster, MapIMG– Build, test, and documentation– Gateway user interface

17

Page 18: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Performance Profiling

• Performance profiling is an important tool for developing scalable and efficient high performance applications

• Performance profiling identified computational bottlenecks in pRasterBlaster

• Demonstration of one example of the value of profilers for pRasterBlaster in the next slides

Page 19: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

A Computational Bottleneck: Symptom

19

Page 20: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

A Computational Bottleneck: Symptom

20

Page 21: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

A Computational Bottleneck: Cause

Page 22: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

A Computational Bottleneck: Analysis

• Spatial data-dependent performance anomaly– The anomaly is data dependent– Four corners of the raster dataset were processed by

processors whose indexes are close to the two ends• Exception handling in C++ is costly

– Coordinate transformation on nodata area was handled as an exception

• Solution– Remove C++ exception handling part

22

Page 23: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

A Computational Bottleneck: Performance Improvement

Page 24: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

A Computational Bottleneck: Summary

• Symptom – Processors responsible for polar regions spent more time than

those processing equatorial region• Cause

– Corner cells were mapped to invalid input raster cells generating exceptions

– C++ exception handling was expensive• Solution

– Removed C++ exception handling– Corner cells need not to be processed

• They now contribute less time of computation

24

Page 25: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

pRasterBlaster Component View

25

librasterblasterpRasterBlaster MapIMG

Cyberinfrastructure Service Providers GIS Programmers End Users

via API

CyberToolkit

Page 26: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Performance

Test: - On an XSEDE

supercomputer (Trestles at the San Diego Supercomputing Center)

- Using a parallel file system (Luster) and MPI I/O (vs. traditional Network File System (NFS))

- 40GB data- Processor cores were

increased from 256 to 1024

Page 27: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Obstacles, Issues, Challenges

• Parallel I/O (particularly raster) is the proverbial long pole in tent• Raster decomposes nicely (embarrassingly parallel)

• File I/O (especially output file re-composition) is a huge bottleneck• Lessons learned; one of our prime contributions to the

community (to date): optimized parallel I/O for raster– GeoTIFF (SPTW – Simple Parallel TIFF Writer) led by David Mattli, USGS– HDF5 parallel work by Babak Bahzad, UIUC

Page 28: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Computational Challenges

• Converting legacy (linear) code to HPC (parallel) environment requires a lot of skilled manpower

• Scaling to large-scale analysis using HPC resources is difficult

• Cyberinfrastructure-based computational analysis needs in-depth knowledge and expertise on computational performance profiling and analysis

28

Page 29: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Geospatial AnalyticsSpatial Modeling/ Geovisualization

• Solving “Changing World” Problems• Smart Decisions• Protecting Natural Resources• Democratizing Science• Empowering cultures• Products and Services for society and its citizens

Data & Software Solving (Geospatial) Problems

Page 30: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Geospatial AnalyticsSpatial Modeling/ Geovisualization

Page 31: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

References• Behzad, Babak, Yan Liu, Eric Shook, Michael P. Finn, David M. Mattli, and Shaowen Wang (2012). A Performance Profiling Strategy for

High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH.

• Finn, Michael P., Yan Liu, David M. Mattli, Babak Behzad, Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, Anand Padmanabhan, Michael Stramel, and Shaowen Wang (2014). High-Performance Small-Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag.

• Finn, Michael P., Yan Liu, David M. Mattli, Qingfeng (Gene) Guan, Kristina H. Yamamoto, Eric Shook and Babak Behzad (2012). pRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment. Abstract presented at the XXII International Society for Photogrammetry & Remote Sensing Congress, Melbourne, Australia.

• Liu, Yan, Michael P. Finn, Babak Behzad, and Eric Shook (2013). High-Resolution National Elevation Dataset: Opportunities and Challenges for High-Performance Spatial Analytics. Abstract presented in the Special Session on “Big Data,” American Society for Photogrammetry and Remote Sensing Annual Conference. Batltimore, Maryland.

• Liu, Yan, Anand Padmanabhan, and Shaowen Wang, (2014) CyberGIS Gateway for enabling data-rich geospatial research and education, Concurrency Computat.: Pract. Exper., DOI: 10.1002/cpe.3256.

• Rey, S.J. (2014) “Open regional science." Presidential Address, Western Regional Science Association, San Diego. February.

• http://cegis.usgs.gov/• http://nationalmap.gov/3DEP/ • http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php• http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Main_Page• http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Software:pRasterBlaster

Page 32: Geospatial Analytics for Government Agencies and the General Public:  The CyberGIS Toolkit as an Enabler

Geospatial Analytics for Government Agencies and the General Public:

The CyberGIS Toolkit as an Enabler

U.S. Department of the InteriorU.S. Geological Survey

Questions?

http://cegis.usgs.gov/index.html

High Performance Computing and Geospatial Analytics WorkshopArgonne National Laboratory29 – 30 Apr 2014