Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an...
description
Transcript of Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an...
Geospatial Analytics for Government Agencies and the General Public:
The CyberGIS Toolkit as an Enabler
U.S. Department of the InteriorU.S. Geological Survey
Michael P. Finn
High Performance Computing and Geospatial Analytics WorkshopArgonne National Laboratory29 – 30 Apr 2014
Collaborators• Shaowen Wang, Anand Padmanabhan, Yan Liu
– University of Illinois at Urbana-Champaign (UIUC), CyberInfrastructure and Geospatial Information Laboratory
• David M. Mattli, Jeff Wendel, E. Lynn Usery, Michael Stramel– USGS, Center of Excellence for Geospatial Information Science (CEGIS)
• Kristina H. Yamamoto– USGS, National Geospatial Technical Operations Center
• Babak Behzad– UIUC, Department of Computer Science
• Eric Shook– Kent State University, Department of Geography
• Qingfeng (Gene) Guan– China University of Geosciences
Where Do We Want to Go?
• Geospatial Analytics– Spatial Modeling– Geovisualization (GeoViz/ Visual Analytics)
• For Decision Makers (agencies/ citizens)– Protect natural resources– Empower cultures– Provide for our future
Geospatial AnalyticsSpatial Modeling/ Geovisualization
Data / Software
Geospatial Methods, Technologies, and
Applications
GIScience and Cyberinfrastructure
Geospatial Toolkits
Geospatial Analytics (Spatial
Modeling / GeoViz)
So:- Where have we been?- Where are we now?- Where do we want to go?
Data
• Analog Digital• “Big” Data• Spatial Data (geometric structure)• Data: Open? – mostly
– Findable, Accessible, Exploitable (standard format)
• Example: USGS Data holdings– 8 Layers of the National Map– Soon: Hyperspectral cubes and LiDAR point clouds
Quality Level
Horizontal Point Spacing (meters)
Vertical Accuracy
(centimeters) Description
1 0.35 9.25 High accuracy and resolution lidar example: lidar data collected in the Pacific Northwest
2 0.7 9.25 Medium-high accuracy and resolution lidar
3 1-2 <18.5 Medium accuracy and resolution lidar – analogous to USGS specification v. 13 and most data collected to date
4 5 46-139Early or lower quality lidar and photogrammetric elevations produced from aerotriangulated NAIP imagery
5 5 93-185Lower accuracy and resolution, primarily from IfSAR
The National Map- Elevation: Quality Levels
http://nationalmap.gov/3DEP/neea.html
Big Spatial Data• Geographic data of high resolution and covering large areas
creates big spatial data• Remotely-sensed images
– One-meter resolution NAIP images for Dent County, Missouri (1,955 km²) require 800 GB of storage space (more than 4 Pb equivalent for U.S.)
– Atlanta footprint of 0.33 m resolution color images is almost 1 Tb of data– Satellite images with finer than one meter resolution– LiDAR data of level 1 (8 pts per square meter), level 2 (2 points per square
meter)
Big Spatial Data• USGS 3DEP – Level 2 LiDAR for all of U.S. except Alaska which
is acquiring level 5 IfSAR – Data volume for point cloud, intensity images, and bare Earth elevation
model – 7 to 9 petabytes– Processing and file creation usually doubles to triples the storage
requirements• Other geospatial data – USGS National Hydrography Dataset
based on 1:24,000 scale about 700 GB (equivalent resolution 12 m; accuracy 25 m RMSE)
• New project to extract hydrography from level 2 lidar– How big will the vector (< 1 m Resolution) dataset be that results?
Software
• Computer compiled/ scripting languages– Manipulate data
• Software– Commercial? Open? Modifiable code? Functional?
• Tools: SAS (SPSS)/ R/ MATLAB, etc., etc…..• GIS Software: Esri ArcGIS/ QGIS
– and image processing S/W: Imagine/ ENVI– Libraries: GDAL
• Example software: mapIMG (based on CGTP; open)
Geospatial Methods, Technologies, and Applications
• Analytical Cartography– Mathematical Cartography– Since roughly the 18th Century
• Quantitative Geography– Since 1960s
• GIS (and image processing S/W)– Since about the 1970s– combining data & software GIS Packages– Legacy of primarily commercial software
• Open Source Software– Since roughly 1980s
• OpenGIS?– early wide-spread but often spotty “open” GIS– Foundation for maturity, expansion, and further openness
Here we are/ where are we going?
• Open GIS: Technology and Applications (exploitable)• Hardware and Operating Systems evolving• Data Storage trying to keep pace with Big Data• Advanced GeoViz on cusp of exploding• HPC High-Performance Spatial Computing• Increasing Spatiotemporal fidelity• Cyberinfrastructure
CyberGIS
• Cyberinfrastructure (eScience)• HPC & GIScience• A balance/ interaction between theory/ data
(Rey, 2013)• Collaborative Research• Standards (for interoperability)
NSF CyberGIS Project• NSF Software Infrastructure for Sustained Innovation Award
– http://cybergis.org
• USGS/ CEGIS Participation• Cyberinfrastructure resources
– XSEDE– Blue Waters supercomputer allocation– Open Science Grid
• Integration– CyberGIS Toolkit– CyberGIS Gateway– GISolve middleware services
14
CyberGIS Software Environment
From Liu et al. (2014)
CyberGIS Toolkit• Software Components
• PABM – Parallel Agent-Based Modeling• pRasterBlaster – Parallel Map Reprojection• Parallel PySAL (Python Spatial Analysis Library)• Spatial Text
• An open and reliable software toolbox for high-end users• Hide compute complexity • A rigorous software building, testing, packaging, and deployment framework• Focused on computational intensity, performance, scalability, and portability
in various CI environments• Easy to configure and use
Scalable Raster Processing
• Need for scalable map reprojection in CyberGIS analytics– Spatial analysis and modeling
• Distance calculation on raster cells requires appropriate projection– Visualization
• Reprojection for faster visualization on Web Mercator base maps
• pRasterBlaster integration in CyberGIS Toolkit and Gateway– Software componentization: librasterblaster, pRasterBlaster, MapIMG– Build, test, and documentation– Gateway user interface
17
Performance Profiling
• Performance profiling is an important tool for developing scalable and efficient high performance applications
• Performance profiling identified computational bottlenecks in pRasterBlaster
• Demonstration of one example of the value of profilers for pRasterBlaster in the next slides
A Computational Bottleneck: Symptom
19
A Computational Bottleneck: Symptom
20
A Computational Bottleneck: Cause
A Computational Bottleneck: Analysis
• Spatial data-dependent performance anomaly– The anomaly is data dependent– Four corners of the raster dataset were processed by
processors whose indexes are close to the two ends• Exception handling in C++ is costly
– Coordinate transformation on nodata area was handled as an exception
• Solution– Remove C++ exception handling part
22
A Computational Bottleneck: Performance Improvement
A Computational Bottleneck: Summary
• Symptom – Processors responsible for polar regions spent more time than
those processing equatorial region• Cause
– Corner cells were mapped to invalid input raster cells generating exceptions
– C++ exception handling was expensive• Solution
– Removed C++ exception handling– Corner cells need not to be processed
• They now contribute less time of computation
24
pRasterBlaster Component View
25
librasterblasterpRasterBlaster MapIMG
Cyberinfrastructure Service Providers GIS Programmers End Users
via API
CyberToolkit
Performance
Test: - On an XSEDE
supercomputer (Trestles at the San Diego Supercomputing Center)
- Using a parallel file system (Luster) and MPI I/O (vs. traditional Network File System (NFS))
- 40GB data- Processor cores were
increased from 256 to 1024
Obstacles, Issues, Challenges
• Parallel I/O (particularly raster) is the proverbial long pole in tent• Raster decomposes nicely (embarrassingly parallel)
• File I/O (especially output file re-composition) is a huge bottleneck• Lessons learned; one of our prime contributions to the
community (to date): optimized parallel I/O for raster– GeoTIFF (SPTW – Simple Parallel TIFF Writer) led by David Mattli, USGS– HDF5 parallel work by Babak Bahzad, UIUC
Computational Challenges
• Converting legacy (linear) code to HPC (parallel) environment requires a lot of skilled manpower
• Scaling to large-scale analysis using HPC resources is difficult
• Cyberinfrastructure-based computational analysis needs in-depth knowledge and expertise on computational performance profiling and analysis
28
Geospatial AnalyticsSpatial Modeling/ Geovisualization
• Solving “Changing World” Problems• Smart Decisions• Protecting Natural Resources• Democratizing Science• Empowering cultures• Products and Services for society and its citizens
Data & Software Solving (Geospatial) Problems
Geospatial AnalyticsSpatial Modeling/ Geovisualization
References• Behzad, Babak, Yan Liu, Eric Shook, Michael P. Finn, David M. Mattli, and Shaowen Wang (2012). A Performance Profiling Strategy for
High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH.
• Finn, Michael P., Yan Liu, David M. Mattli, Babak Behzad, Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, Anand Padmanabhan, Michael Stramel, and Shaowen Wang (2014). High-Performance Small-Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag.
• Finn, Michael P., Yan Liu, David M. Mattli, Qingfeng (Gene) Guan, Kristina H. Yamamoto, Eric Shook and Babak Behzad (2012). pRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment. Abstract presented at the XXII International Society for Photogrammetry & Remote Sensing Congress, Melbourne, Australia.
• Liu, Yan, Michael P. Finn, Babak Behzad, and Eric Shook (2013). High-Resolution National Elevation Dataset: Opportunities and Challenges for High-Performance Spatial Analytics. Abstract presented in the Special Session on “Big Data,” American Society for Photogrammetry and Remote Sensing Annual Conference. Batltimore, Maryland.
• Liu, Yan, Anand Padmanabhan, and Shaowen Wang, (2014) CyberGIS Gateway for enabling data-rich geospatial research and education, Concurrency Computat.: Pract. Exper., DOI: 10.1002/cpe.3256.
• Rey, S.J. (2014) “Open regional science." Presidential Address, Western Regional Science Association, San Diego. February.
• http://cegis.usgs.gov/• http://nationalmap.gov/3DEP/ • http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php• http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Main_Page• http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Software:pRasterBlaster
Geospatial Analytics for Government Agencies and the General Public:
The CyberGIS Toolkit as an Enabler
U.S. Department of the InteriorU.S. Geological Survey
Questions?
http://cegis.usgs.gov/index.html
High Performance Computing and Geospatial Analytics WorkshopArgonne National Laboratory29 – 30 Apr 2014