A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING...

16
Tiling and Stitching raster data - GIS data processing in distributed computing environment Angéla Olasz, Binh Nguyen Thai, Dániel Kristóf Institute of Geodesy, Cartography and Remote Sensing (FÖMI) Directorate of Geoinformation

Transcript of A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING...

Page 1: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

Tiling and Stitching raster data -GIS data processing in distributed computing environment

Angéla Olasz, Binh Nguyen Thai, Dániel Kristóf

Institute of Geodesy, Cartography and Remote Sensing (FÖMI)Directorate of Geoinformation

Page 2: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

1. Introduction2. IQmulus and IQLib short introduction3. Defining Geospatial Big Data4. Aspects of requirements and comparison of

existing solution5. Design and implementation6. Conclusion/future work

2

Content of this talk

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016

Page 3: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

Our goal is to find a solution for processing of big geospatial data in a distributed ecosystem providing an environment to run algorithms, services, processing modules without any limitations on implementation programming language as well as data partitioning strategies and distribution among computational nodes in order to run existing GIS processing scripts.

As a first step we focus on raster data representation:(i) decomposition = Tiling (& Stitching) and(ii) distributed processing. Before building this prototype system, we have 1. analyzed data

decomposition patterns. 2. defined the common GIS user requirements on processing environments of Big Geospatial Data 3. tried to identify Geospatial Big Data with the help of the 4 „V”s. 4. compared existing solutions on selected aspects.

3

Introduction

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016

Page 4: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

4

IQmulus

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016

A High-volume Fusion and Analysis Platform for Geospatial Point Clouds, Coverages and Volumetric Data Sets„IQmulus will leverage the information hidden in large heterogeneous geospatial data sets and make them a practical choice to support reliable decision making.”

4 year FP7 EU Research Project 2012 November – 2016 November12 European partner, 7 countrieswww.IQmulus.eu

Page 5: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

To have a better understanding on what are the main attributes of geospatial big data: it is hard to delineate the margin starting to “exceed the capability of spatial computing technology”.To estimate the size of the processable amount of data are use-case specific, there are some good examples (Evans et al., 2014) where the authors tried to identify the Geospatial Data and Geospatial Big Data differences.

The nature of the digital representation of the continuous space can be grouped in 4 or 5 type: vector, raster, 3D representation, network, (and geolocation-aware media)Collected characteristic on: Formats, GIS operations, and the 3 ”V”s of Big Data: Volume, Velocity, Variety, and also on Visualization

5

Defining Geospatial Big Data

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016

Page 6: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

6

Defining Geospatial Big Data

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016

Page 7: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

7

Aspects of requirements and comparison of existing solution

We have collected the most popular frameworks supporting distributed computing with GIS data. We tried to investigate the capabilities of each framework in the following aspects:

what kind of:• iInput/output data types are supported or suitable for that particular

framework, • Supported GIS processing (or executable languages)• Data Management flexibility- supervision of the data distribution • Scalability potential• Supported OS/Platform dependencies• Database model• GIS Case studies, projects …

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016

Page 8: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

8

Aspects of requirements and benchmarking of existing solution

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016

Page 9: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

While most of current processing frameworks follow the same methodology as Hadoop and utilize the same data storage concept as HDFS. One of the biggest disadvantage from processing point of view was the data partitioning mechanism performed by HDFS file system and distributed processing programming model.

In most cases we would like to have full control over our data partitioning and distribution mechanism.

Existing GIS algorithms (without or with small modification) can’t be executed (python, Matlab, R, etc.).

We decided to develop our own distributed processing framework.

Initiative for a new framework

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016 9

Page 10: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

IQLib - Objectives

10

Source: https://github.com/posseidon/IQLib/IQLib specification

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016

Main goal is to allow an actor (human or machine code) to organize huge data sets describing geographical survey areas.

IQLib supports the creation of semantic data aggregations within large data sets, and can be used to overcome scalability limitations of processing algorithms.

IQLib’s core functionality is:1. Tiling is the decomposition of a survey area in which data points are either associated to polygons (regular or irregular), or grouped according to temporal attributes, or grouped into equally sized chunks, or a mixture of the above.2. IQLib should provide the functionality to stitching the output data files into a single large file.

Page 11: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

High level concept of IQLib processing framework

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016 11

Page 12: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

Modules

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016

As a result IQLib has three major modules; each module is responsible for a major step in GIS data processing.

Data Catalogue module: Data catalogue module is responsible for storing metadata corresponding to survey areas. A survey area contains all the dataset that are logically related to inspection area, store all the available, known and useful information on those data for processing.

Tiling & stitching module: Tiling algorithms usually process raw data, after these tiled data are distributed across processing nodes. Stitching usually runs after processing services have successfully done their job. Metadata of tiled dataset are registered into Data Catalogue. With this step we always know the parents of tiled data.

Distributed processing module: Distributed processing module is responsible for running processing services on tiled dataset.

12

Page 13: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

Modules- status

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016

Data Catalogue module: We have defined and implemented the data model, the data/metadata access procedures. After final approval phase goes open source.

Tiling & stitching module:Pre-defined tiling and stitching methods

tailored for processing algorithms.

Distributed processing module: Using existing processing algorithms without any modifications or very little adjustments.

13

Page 14: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

ConclusionIQLib Specification is available

on GitHub at https://github.com/posseidon/iqlib.

Under development:• Tiling & Stitching methods

ecosystem.• Distributed processing

system.Going to be published Open

Source soon.

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016 14

Page 15: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

Related papers

• Olasz A. and Nguyen Thai B. 2016. Development of a new framework for Distributed Processing of Geospatial Big Data, FOSS4G 2016 Bonn Academic Track

• Nguyen Thai B. and Olasz A. 2015. Raster data partitioning for support distributed GIS processing, Proceedings of ISPRS Geospatial Week. Montpellier, 28.09.-02.10.2015 ISPRS.Vol. XL-3/W3, pp. 543-550.

• Olasz A. and Nguyen Thai B. 2014. Decision support on distributed computing environment (IQmulus) Proceedings of the 3rd Open Source Geospatial Research & Education Symposium OGRS 2014

GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016 15

Page 16: A NEW INITIATIVE FOR TILING, STITCHING AND PROCESSING GEOSPATIAL BIG DATA IN DISTRIBUTED COMPUTING ENVIRONMENTS

Institute of Geodesy, Cartography and Remote Sensing (FÖMI)Directorate of Geoinformation 5. Bosnyák sqr. BUDAPEST, HUNGARY 1149 www.fomi.hu, www.iqmulus.eu

Thank you for your attention!

Angéla [email protected]

16

Please find more details in our paper in ISPRS Annals 2016:

Binh Nguyen [email protected]