Geospatial Data Processing with Stetl -...

46
Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz - Switserland June 24, 2016 www.justobjects.nl www.stetl.org

Transcript of Geospatial Data Processing with Stetl -...

Page 1: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Geospatial Data Processing with Stetl

Just van den BroeckeGeoPython 2016

Muttenz - SwitserlandJune 24, 2016

www.justobjects.nl

www.stetl.org

Page 2: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

About MeIndependent Open Source Geospatial Professional

+ Secretary OSGeo Dutch Local Chapter + Member of the Dutch OpenGeoGroep

Just van den [email protected] www.justobjects.nl

Page 3: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Agenda

• Spatial ETL• Stetl

• Concepts• Cases• Status

• Q & A

Page 4: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Spatial ETL

Page 5: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Data Wrangling

Page 6: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

ETL - Extract Transform Load

Page 7: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Spatial ETL

GML

PostGIS

Shapefile

WFS

CSV

SQLite

XML

GMLPostGIS

Shapefile

WFSCSV

SQLite

XML

Page 8: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

GMLPostGIS

Shapefile

WFSCSV

SQLite

XML

Spatial ETL Example non-standard source

Page 9: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

From: https://live.osgeo.org/en/overview/gdal_overview.html

Page 10: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz
Page 11: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Plenty of Tools…

Each tool is powerful by itself but cannot do the entire ETL

ogr2ogr

Spatial ETL

Page 12: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

FOSS ETL - How to Combine Components?

=+ + ?+ ..

Page 13: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example - 2011 INSPIRE-FOSS

http://inspire.kademo.nl/doc/design-etl.html

Nice ideas but hard to scale, deploy and reuse.

Need Framework

Page 14: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Solution: Add Python to the Equation

=+ + ?( )+ ..

Page 15: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Stetl

Solution: Add Python to the Equation

=+ +( )+ ..

Page 16: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Stetl =

Simple Streaming

Spatial Speedy

ETL

Page 17: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Stetl Concepts

Page 18: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Process Chain

Input Filter OutputFilter

Stetl concepts

Source Target

Page 19: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Process Chain

Input Filter Outputgml

Filter

Stetl concepts

Page 20: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example: GML to PostGIS

GMLReader

PG Output

gml

Stetl concepts

Page 21: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example: Data Model Transform

OGRReader XSLT

GMLWriter

gml

Stetl concepts

Simple Features

Complex Features

or Jinja2 !

Page 22: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Process Chain - How?

Input Filters Output

Stetl concepts

Stetl Config File

Instantiate

Page 23: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example: XML to Shape

XMLInput

XSLTFilter

OGROutput

Page 24: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example: XML to Shape

The Source File

Page 25: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example: XML to Shape

XMLInput

Page 26: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example: XML to Shape

XMLInput

XSLTFilter

Page 27: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example: XML to Shape

Prepare XSLT Script

Page 28: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example: XML to Shape

XSLT GML Output

Page 29: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example: XML to Shape

XMLInput

XSLTFilter

OGROutput

OGC Simple

Features

XML DOM

Page 30: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example: XML to Shape

The Stetl Config File

ProcessChain

XMLInputXSLT

Filter

ogr2ogrOutput

Page 31: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Running Stetl

stetl -c etl.cfg

stetl -c etl.cfg [-a <properties>]

Page 32: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Installing Stetl - PyPi

Deps•GDAL+Python bindings•lxml (xml proc)•psycopg2 (Postgres)

sudo pip install stetl

Page 33: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Installing Stetl - new: Docker

https://hub.docker.com/r/justb4/stetl

Page 34: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Speed: Streaming

Input Filter Output

gml

Stetl concepts

Page 35: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Speed: Going Native

Input Filter Output

gml

ogr2ogr StetlStetl

Native C Libs/Progs

Calls

Stetl concepts

Page 36: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example Components

Input Filters Output

Stetl concepts

XMLFile XSLT GML

ogr2ogr XMLAssembler GDAL/OGR

LineStream XMLValidator WFS-T

Postgres/PostGIS Jinja2 Postgres/PostGISdeegree* FeatureExtractor deegree*YourInput YourFilter YourOutput

Page 37: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Example: XsltFilter Pythonfrom util import Util, etreefrom filter import Filterfrom packet import FORMAT

log = Util.get_log("xsltfilter")

class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close()

def invoke(self, packet): if packet.data is None: return packet return self.transform(packet)

def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet

Page 38: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

[etl]chains = input_xml_file|my_filter|output_std

[input_xml_file]class = inputs.fileinput.XmlFileInputfile_path = input/cities.xml

# My custom component[my_filter] class = my.myfilter.MyFilter

[output_std]class = outputs.standardoutput.StandardXmlOutput

class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet

Your Own Components

Stetl concepts

Step 1- Define Class

Step 2- Configure Class

Page 39: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Data Structures

Stetl concepts

• Components exchange Packets • Packet contains data and status• Data formats, e.g. :

xml_line_stream etree_docetree_element (feature)etree_element_arraystringany..

Page 40: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Cases

Page 41: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz
Page 42: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Cases - GML to PostGIS

• National Dutch Open Datasets (GML) http://nlextract.nl ✴ Topography: Top10NL, BGT✴ Cadastral Parcels (BRK)

• Ordnance Survey UK✴ PoC Topography (OS Mastermap)

Page 43: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Cases - IoT/SensorWeb• SOSPilot

✴ Dutch Air Quality Data✴ publish to PostGIS+SOS (SOS-T)✴ EU Reporting (Jinja2 Filter)✴ sospilot.geonovum.nl

• Smart Emission✴ AQ Sensors hosted by citizens✴ Calibration/Aggregation✴ publication to SOS and OGC SensorThings✴ data.smartemission.nl

Page 44: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Project Status - June 24, 2016

• v1.0.9 installable via PyPi or Docker • Documentation on www.stetl.org • Real world transforms done

Page 45: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Project - Planned & in progress

• PyWPS integration • GUI - Flask+Celery+Redis? - Node Red? - Jupiter Notebook?• More on GitHub github.com/justb4/stetl

Page 46: Geospatial Data Processing with Stetl - GeoPython2016.geopython.net/talks/geopython-2016-1024-v1.pdf · Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz

Thank You !

www.stetl.org

github.com/justb4/stetl