A Kepler-based Three Tier Architecture applied to LiDAR Interpolation and Analysis Efrat Frank,...

1
A Kepler-based Three Tier Architecture applied to LiDAR Interpolation and Analysis Efrat Frank, Ilkay Altintas San Diego Supercomputer Center, UCSD Configuration phase Subset: DB2 query on DataStar Porta l Grid Analyze move process Visuali ze move render display Interpolate: Grass RST, Grass IDW, GMT… Visualize: Global Mapper, FlederMaus, ArcIMS Scheduling/ Output Processing Monitoring/ Translation Example of LiDAR data acquired along the Northern San Andreas fault in Sonoma County, California. Left: Hillshade produced from the first return surface DEM (Digital Elevation Model) derived from the LiDAR data. In this heavily forested region the first return surface largely shows the tree canopy top. Right: Hillshade of the last return surface DEM for the same area shown in left image. The multiple returns offered by the LiDAR workflow allow for “virtual deforestation” and the creation of a “bare-earth” model of the ground surface. Note San Andreas fault and roads not visible in the first return hillshade. LiDAR data represents an important new tool for the study of the earth’s surface, especially in regions where heavy vegetation makes traditional techniques such as aerial photography ineffective. (Source: Christopher J. Crosby, J. Ramon Arrowsmith, GEON, ASU) R. Haugerud, U.S.G.S D. Harding, NASA Point Cloud x, y, z n , LiDAR Introduction Survey Process & Classify Analyze / “Do Science” Interpolate / Grid •LiDAR (Light Distance And Ranging, a.k.a ALSM, Airborne Laser Swath Mapping) point cloud datasets, a high performance processing of high point density datasets. •LiDAR generates massive data volumes - billions of returns are common. •Distribution of these volumes of point cloud data to users via the internet represents a significant challenge. •Processing and analysis of these data requires significant computing resources not available to most geoscientists. •Interpolation of these data challenges typical GIS/ interpolation software. •our tests indicate that ArcGIS, Matlab and similar software packages struggle to interpolate even a small portion of these data. •Traditionally: Popularity > Resources The Computational Challenge: •GOAL: Efficient three-tier architecture for LiDAR interpolation and analysis using GEON infrastructure and tools •GEON Portal - front end layer •Kepler Scientific Workflow System - control layer •Kepler is used as a batch execution engine •GEON Grid - computation layer •Use scientific workflows to glue/combine different tools and the infrastructure •The architecture provides an efficient and reliable LiDAR data analysis GEON’s Solution:A Three-Tier Architecture for LiDAR Processing Render Map DB2 DB2 Spatial query NFS Mounted Disk ArcInfo Compute Cluster x,y,z and attribute raw data process output KEPLER WORKFLOW Parameter xml Create Workflow Description ArcSDE ArcIMS Map onto the grid Grass surfacing algorithms: Spline IDW block mean Download data Binary grid ASCII grid Text file Tiff/Jpeg/ Gif ASCII grid Client/ GEON Portal Map and Attributes Grass Functions and Parameters submit http://geongrid.org Kepler includes contributors from GEON, SEEK, SDM Center, Ptolemy II, ROADNet, CIPRes and Resurgence supported by NSF ITRs 0225673 (GEON), 022567 (SEEK), DOE DE-FC02-01ER25486 (SciDAC/SDM), and DARPA F33615-00-C-1703 (Ptolemy). Future Plans Improve overall performance using advanced processing tools •Parallel interpolation, enhanced visualization • Extend built-in failure recovery and reporting features • Additional portal execution and registration support • Utilize provenance information for workflow product registration • Create graphical illustration of job progress / location in the workflow to demonstrate the distributed nature of the system ULTIMATE GOAL: Make it useful to a wide range of earth science users! Contributors Efrat Jaeger-Frank, Ilkay Altintas, Chaitan Baru, Ashraf Memon, Viswanath Nandigam, (GEON, San Diego Supercomputer Center, UCSD) Christopher J. Crosby, Jefferey S. Conner, J. Ramon Arrowsmith (GEON, ASU) •An extensible, easy to use, workflow design and prototyping tool •On-the-fly creation of workflow instances from workflow templates •Integrating heterogeneous local and remote tools in a single interface: •Gridding and Imaging services via Web and Grid services •GIS services •Remote tools via SSH, SCP and GridFTP •Relational and spatial databases access •Direct access to data and tools from remote repositories •Reusable generic and domain specific actors •Support for High Performance Computations: •Job submission and monitoring •Logging of execution trace and registering intermediate products •Data provenance and failure recovery •Portal accessibility. •GEON LiDAR Workflow is deployed on the GEON portal •Reverse engineering of traditional approach •GLW is exposed to a high risk of components failures •Long running process •Distributed computational resources under diverse controlling authorities •Kepler provides transparent/background error handling using provenance data •A unified interface to follow up on the status of submitted jobs •View job metadata •Zoom to a specific bounding box location •Track errors •Modify a job and re-submist •View the processing results •In the future, register desired workflow products •Useful for publication LiDAR Job Management and Monitoring •Online data acquisition and access •Managing large databases •Indexing data on spatial and temporal attributes •Quick subsetting operations •Large scale resource sharing and management •Collaborative and distributed applications •Parallel gridding algorithms on large data sets using high performance computing •Integrate data with other related data sets, e.g. geologic maps, and hydrology models •Provide easy-to-use user interfaces from portals and scientific workflow environments Increasing Usage of Technology in Geosciences LiDAR Processing via Kepler
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of A Kepler-based Three Tier Architecture applied to LiDAR Interpolation and Analysis Efrat Frank,...

Page 1: A Kepler-based Three Tier Architecture applied to LiDAR Interpolation and Analysis Efrat Frank, Ilkay Altintas San Diego Supercomputer Center, UCSD Configuration.

A Kepler-based Three Tier Architecture applied to LiDAR Interpolation and Analysis

Efrat Frank, Ilkay AltintasSan Diego Supercomputer Center, UCSD

Configuration phase

Subset: DB2 query on DataStarPortal

Grid

Analyze

move process

Visualize

move render display

Interpolate: Grass RST, Grass IDW, GMT…

Visualize: Global Mapper, FlederMaus, ArcIMSScheduling/OutputProcessing

Monitoring/Translation

Example of LiDAR data acquired along the Northern San Andreas fault in Sonoma County, California. Left: Hillshade produced from the first return surface DEM (Digital Elevation Model) derived from the LiDAR data. In this heavily forested region the first return surface largely shows the tree canopy top. Right: Hillshade of the last return surface DEM for the same area shown in left image. The multiple returns offered by the LiDAR workflow allow for “virtual deforestation” and the creation of a “bare-earth” model of the ground surface. Note San Andreas fault and roads not visible in the first return hillshade. LiDAR data represents an important new tool for the study of the earth’s surface, especially in regions where heavy vegetation makes traditional techniques such as aerial photography ineffective. (Source: Christopher J. Crosby, J. Ramon Arrowsmith, GEON, ASU)

R. Haugerud, U.S.G.S

D. Harding, NASA

Point Cloudx, y, zn, …

LiDAR IntroductionSurvey

Process & Classify

Analyze / “Do Science”

Interpolate / Grid •LiDAR (Light Distance And Ranging, a.k.a ALSM, Airborne Laser Swath Mapping) point cloud datasets, a high performance processing of high point density datasets. •LiDAR generates massive data volumes - billions of returns are common.•Distribution of these volumes of point cloud data to users via the internet represents a significant challenge. •Processing and analysis of these data requires significant computing resources not available to most geoscientists.•Interpolation of these data challenges typical GIS/ interpolation software.

•our tests indicate that ArcGIS, Matlab and similar software packages struggle to interpolate even a small portion of these data.

•Traditionally: Popularity > Resources

The Computational Challenge:

•GOAL: Efficient three-tier architecture for LiDAR interpolation and analysis using GEON infrastructure and tools

•GEON Portal - front end layer•Kepler Scientific Workflow System - control layer

•Kepler is used as a batch execution engine•GEON Grid - computation layer

•Use scientific workflows to glue/combine different tools and the infrastructure•The architecture provides an efficient and reliable LiDAR data analysis

GEON’s Solution:A Three-Tier Architecture for LiDAR Processing

Render Map

DB2

DB2Spatialquery

NFS Mounted Disk

ArcInfo

Compute Cluster

x,y,z and attribute

raw data process

output

KEPLER WORKFLOW

Parameterxml

CreateWorkflow

Description

ArcSDE ArcIMS

Map onto the grid

Grass surfacing algorithms: Spline IDW block mean …

Download data

Binary gridASCII grid

Text fileTiff/Jpeg/Gif ASCII grid

Client/ GEON Portal

Map and Attributes

Grass Functions and Parameterssubmit

http://geongrid.org

Kepler includes contributors from GEON, SEEK, SDM Center, Ptolemy II, ROADNet, CIPRes and Resurgence supported by NSF ITRs 0225673 (GEON), 022567 (SEEK), DOE DE-FC02-01ER25486 (SciDAC/SDM), and DARPA F33615-00-C-1703 (Ptolemy).

Future Plans• Improve overall performance using advanced processing tools

•Parallel interpolation, enhanced visualization• Extend built-in failure recovery and reporting features• Additional portal execution and registration support• Utilize provenance information for workflow product registration• Create graphical illustration of job progress / location in the workflow to demonstrate the distributed nature of the systemULTIMATE GOAL: Make it useful to a wide range of earth science users!

Contributors Efrat Jaeger-Frank, Ilkay Altintas, Chaitan Baru, Ashraf Memon, Viswanath Nandigam, (GEON, San Diego Supercomputer Center, UCSD)Christopher J. Crosby, Jefferey S. Conner, J. Ramon Arrowsmith (GEON, ASU)

•An extensible, easy to use, workflow design and prototyping tool•On-the-fly creation of workflow instances from workflow templates

•Integrating heterogeneous local and remote tools in a single interface:•Gridding and Imaging services via Web and Grid services•GIS services•Remote tools via SSH, SCP and GridFTP•Relational and spatial databases access•Direct access to data and tools from remote repositories•Reusable generic and domain specific actors

•Support for High Performance Computations:•Job submission and monitoring•Logging of execution trace and registering intermediate products•Data provenance and failure recovery

•Portal accessibility. •GEON LiDAR Workflow is deployed on the GEON portal

•Reverse engineering of traditional approach

•GLW is exposed to a high risk of components failures•Long running process•Distributed computational resources under diverse controlling authorities•Kepler provides transparent/background error handling using provenance data

•A unified interface to follow up on the status of submitted jobs•View job metadata•Zoom to a specific bounding box location •Track errors •Modify a job and re-submist•View the processing results•In the future, register desired workflow products•Useful for publication

LiDAR Job Management and Monitoring

•Online data acquisition and access•Managing large databases

•Indexing data on spatial and temporal attributes •Quick subsetting operations

•Large scale resource sharing and management•Collaborative and distributed applications•Parallel gridding algorithms on large data sets using high performance computing•Integrate data with other related data sets, e.g. geologic maps, and hydrology models•Provide easy-to-use user interfaces from portals and scientific workflow environments

Increasing Usage of Technology in Geosciences

LiDAR Processing via Kepler