Toward an Understanding of the Processing Delay of Peer-to-Peer Relay Nodes
Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a...
Transcript of Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a...
Disy Informationssysteme GmbH
1 Disy Informationssysteme GmbH
Andreas Abecker1, Torsten Brauer1, Johannes Kutterer1,
Jens Nimis2, Patrick Wiener2
Toward An Architecture for Processing
Spatial Big Data
www.disy.net
2 Hochschule Karlsruhe - Technik und Wirtschaft
2016, May 25th – Geospatial World Forum, Rotterdam (NL)
Disy Informationssysteme GmbH
4
Most characteristic big data dimensions
2016, May 25th – Geospatial World Forum, Rotterdam (NL)
In the upcoming years the availability of spatial data
will be exploding
• Cheaper, easier accessible and more detailed
satellite data (incl. micro and nano satellites)
• More and more application ideas for
Unmanned Aerial Vehicles (UAV)
• Cheaper and more powerful in-situ sensors
with real-time remote data transfer
• Cheaper and more powerful mobile sensors
with real-time remote data transfer mounted
on vehicles, coupled to Smartphones etc.
• Internet-of-things, Industry 4.0 etc.
• Volunteered Geographic Information
• Georeferenced social media content
Disy Informationssysteme GmbH
Volume
Ve
loc
ity
Veracity
Va
rie
ty
5
2016, May 25th – Geospatial World Forum, Rotterdam (NL)
Analytics of large, heterogeneous and highly-frequent spatial data
will drive promising application scenarios
• Precision agriculture
• Smart city monitoring and control
• Disaster management
• Smart energy
• Context-specific
marketing and
information services
• …
Disy Informationssysteme GmbH
The BigGIS Research Project
• Project: BigGIS: Prescriptive and Predictive GIS
Based on High-Dimensional Spatio-Temporal Data Structures
• Duration: April 2015 – March 2018
• Funded By: German Ministry for Education and Research (BMBF)
3 Application Partners
Remote sensing SME
Data-integration researcher
Data-mining researcher Decision-support researcher
Data-visualization researcher
Spatial analytics SME In-memory DB SME
Infrastructure researcher
BigGIS Pilot Application 1: Urban Heat Islands
• Context: Urban micro-climate depends on weather, pollution, land-use, urban green, architecture, …
• Goal: More exact assessment of actual situation and short-term, fine-grained prediction of urban micro-climate (temperature, ozone, PM10, …)
• Approach: new measurements plus inter-/extrapolation of measurement data
• Applications: • Routing people with minimum heat
exposure (cp. OpenSense project)
• Targeted warnings for high-risk groups
• Warnings for kindergarten, old-age homes, etc.
• Predominant big data characteristics: variety, veracity
Data sources:
• Official topographic and cadastral data
• Thermography aerial survey Karlsruhe
• Normalized Difference Vegetation Index
(EnviSAT, Landsat)
• Level-of-detail 2: 3D model Karlsruhe
• Sensors of meteorological service and
environment agency (DWD, LUBW)
• Climate data of Karlsruhe University
• Planned: Mobile sensors
• Planned: Participatory sensing
• Planned: Radar data
• Planned: Social media analysis
BigGIS Pilot Application 2: Disaster/Emergency Management
• Context: Disaster Management
(floods, (wild)fires, chemical
accidents, terrorist attacks, …)
• Goal: Within 15min after the
event, have an emergency map
for fire brigades – plus
continuous updates – plus
predictions about further
evolution (e.g., movement of
cloud of poisenous gas)
• Approach: Combine UAV
remote sensing data with
background knowledge and in-
situ observations; data focus
and dimension reduction is key
• Predominant big data
characteristics: volume, variety,
veracity, (velocity)
Disy Informationssysteme GmbH
Data sources:
• Micro Rapid Mapping: micro flight robot
(AiD MC8 Octocopter) with sensors
such as RGB camera (Sony Smart Shot
IL CE QX1), thermal camera (FLIR
Quark 2), Hyperspectral (Cubert UHD
185 Firefly), RTK GPS
• Official topographic + cadastral data:
critical infrastructures, endangered
population, protected sites, …
• Crowdmapping + social media content
9
2016, May 25th – Geospatial World Forum, Rotterdam (NL)
BigGIS Pilot Application 3: Invasive Species
Data sources:
• Land-use and land-cover
data (as fine-grained as
possible)
• Official observation data of
species (collected by
environment agencies, e.g.
by traps)
• Weather observations and
weather forecasts (as fine-
grained as possible)
• Crowdmapping for some
species
Disy Informationssysteme GmbH
11
• Context: Invasive species may
create serious economic
damages or health problems
• Example: Drosophila suzukii
• Goal: understand and predict
the distribution patterns and
dynamics of imvasive species
depending on vegetation,
weather etc.
• Approach: learn distribution
mechanisms from
historic data
• Predominant big data
characteristics:
variety, veracity
2016, May 25th – Geospatial World Forum, Rotterdam (NL)
Toward a pipeline architecture for processing geo data (streams)
Pre-Analytics/Storage
Consumer
Source
Events
Collector
Producer
Ingestion/Queueing
Broker
Analytics
Decider
Delivery
Endpoint
Dashboard
Tm
Messaging System
(Kafka)
S2
S3
Sn
S Source T Target system (e.g.
visualisation)
Data flow Rimpl,expl implicit/explicit Raster Vec Vector
Web
...
Mobile
S1
API3
APIn Brokern
API1
API2
Cadenza
Integration
(R, Java, ...)
Predictive
Analytics
Data Mining
In-Memory DB
(EXASolution)
λ-Architecture
Batch
Stream
HDFS EFTAS
Semantics / Metadata
System Management (docker)
Rimpl
Rexpl
Vec
Primitives
Broker1
Broker2
Toward a pipeline architecture for processing geo data (streams)
Pre-Analytics/Storage
Consumer
Source
Events
Collector
Producer
Ingestion/Queueing
Broker
Analytics
Decider
Delivery
Endpoint
Dashboard
Tm
Messaging System
(Kafka)
S2
S3
Sn
S Source T Target system (e.g.
visualisation)
Data flow Rimpl,expl implicit/explicit Raster Vec Vector
Web
...
Mobile
S1
API3
APIn Brokern
API1
API2
Cadenza
Integration
(R, Java, ...)
Predictive
Analytics
Data Mining
In-Memory DB
(EXASolution)
λ-Architecture
Batch
Stream
HDFS EFTAS
Semantics / Metadata
System Management (docker)
Rimpl
Rexpl
Vec
Primitives
Broker1
Broker2
Embedding into existing Spatial Data
Infrastructures seems to be mandatory
(cp. standardization / OGC / …)
Embedding into existing Spatial Data Infra-
structures seems to be mandatory (cp. standar-
dization / OGC / domain-specific standards…)
Toward a pipeline architecture for processing geo data (streams)
Pre-Analytics/Storage
Consumer
Triple-Store
Source
Events
Collector
Producer
Ingestion/Queueing
Broker
Analytics
Decider
Delivery
Endpoint
Dashboard
Tm
Messaging System
(Kafka)
S2
S3
Sn
S Source T Target system (e.g.
visualisation)
Data flow Rimpl,expl implicit/explicit Raster Vec Vector
Web
...
Mobile
S1
API3
APIn Brokern
API1
API2
Cadenza
Integration
(R, Java, ...)
Predictive
Analytics
Data Mining
In-Memory DB
(EXASolution)
λ-Architecture
Batch
Stream
HDFS EFTAS
Semantics / Metadata
System Management (docker)
Rimpl
Rexpl
Vec
Primitives
Broker1
Broker2
In-memory DB technology for spatial
analytics is inevitable (experiments
with EXAsolution, SAP HANA, Oracle
Spatial … )
Toward a pipeline architecture for processing geo data (streams)
Pre-Analytics/Storage
Consumer
Source
Events
Collector
Producer
Ingestion/Queueing
Broker
Analytics
Decider
Delivery
Endpoint
Dashboard
Tm
Messaging System
(Kafka)
S2
S3
Sn
S Source T Target system (e.g.
visualisation)
Data flow Rimpl,expl implicit/explicit Raster Vec Vector
Web
...
Mobile
S1
API3
APIn Brokern
API1
API2
Cadenza
Integration
(R, Java, ...)
Predictive
Analytics
Data Mining
In-Memory DB
(EXASolution)
λ-Architecture
Batch
Stream
HDFS EFTAS
Semantics / Metadata
System Management (docker)
Rimpl
Rexpl
Vec
Primitives
Broker1
Broker2 Special treatment of
raw data from remote
sensing seems to be
indispensable (in many
respects)
Toward a pipeline architecture for processing geo data (streams)
Pre-Analytics/Storage
Consumer
Source
Events
Collector
Producer
Ingestion/Queueing
Broker
Analytics
Decider
Delivery
Endpoint
Dashboard
Tm
Messaging System
(Kafka)
S2
S3
Sn
S Source T Target system (e.g.
visualisation)
Data flow Rimpl,expl implicit/explicit Raster Vec Vector
Web
...
Mobile
S1
API3
APIn Brokern
API1
API2
Cadenza
Integration
(R, Java, ...)
Predictive
Analytics
Data Mining
In-Memory DB
(EXASolution)
λ-Architecture
Batch
Stream
HDFS EFTAS
Semantics / Metadata
System Management (docker)
Rimpl
Rexpl
Vec
Primitives
Broker1
Broker2
Not treated here: system parts with severe
resource limitations (mobile devices, hardware
on fire-brigade car, hardware on-board UAV, …)
as well as network limitations
- What shall be moved?
- Data or code?
- Raw data or processed data?
Toward a pipeline architecture for processing geo data (streams)
Pre-Analytics/Storage
Consumer
Triple-Store
Source
Events
Collector
Producer
Ingestion/Queueing
Broker
Analytics
Decider
Delivery
Endpoint
Dashboard
Tm
Messaging System
(Kafka)
S2
S3
Sn
S Source T Target system (e.g.
visualisation)
Data flow Rimpl,expl implicit/explicit Raster Vec Vector
Web
...
Mobile
S1
API3
APIn Brokern
API1
API2
Cadenza
Integration
(R, Java, ...)
Predictive
Analytics
Data Mining
In-Memory DB
(EXASolution)
λ-Architecture
Batch
Stream
HDFS EFTAS
Semantics / Metadata
System Management (docker)
Rimpl
Rexpl
Vec
Primitives
Broker1
Broker2
Not yet shown here: user-feedback loops
There are many novel and useful
big (or smart) data applications
- not so much data-driven
- not so much real-time
Toward a pipeline architecture for processing geo data (streams)
Pre-Analytics/Storage
Consumer
Triple-Store
Source
Events
Collector
Producer
Ingestion/Queueing
Broker
Analytics
Decider
Delivery
Endpoint
Dashboard
Tm
Messaging System
(Kafka)
S2
S3
Sn
S Source T Target system (e.g.
visualisation)
Data flow Rimpl,expl implicit/explicit Raster Vec Vector
Web
...
Mobile
S1
API3
APIn Brokern
API1
API2
Cadenza
Integration
(R, Java, ...)
Predictive
Analytics
Data Mining
In-Memory DB
(EXASolution)
λ-Architecture
Batch
Stream
HDFS EFTAS
Semantics / Metadata
System Management (docker)
Rimpl
Rexpl
Vec
Primitives
Broker1
Broker2
Not yet shown here:
- semantic harmonization, geocoding, …
- Ideally, done automatically based on semantic metadata
about sources and algorithms
Three most important big data
dimensions in our experience:
variety, variety, variety
>> so, the „no ETL“ approach
seems to be questionable
Some concluding and some additional remarks
• In our experience, nowadays, variety is the key dimension
• Volume comes with remote-sensing raw data
• Velocity will come with more and more sensors
• Nevertheless, today‘s applications are already pretty demandig !
• Big data technology has to offer already valuable bits and pieces (in-
memory DB, distributed storage and processing, virtualization)
• But embedding spatial big data applications optimally into legacy
hardware/software landscapes still requires some ideas and
experience
• Overall, there is a huge „usability gap“ between raw data / domain-
expert knowledge and machine-learning / decision-support level
• Security and privacy may be significant blockers
• Some working areas for the technical guys:
• Machine learning with dynamically changing spatial aggregations
• Spatial Complex-Event Processing
2016, May 25th – Geospatial World Forum, Rotterdam (NL)
19
Disy Informationssysteme GmbH
Thank you !
www.disy.net
Dr. Andreas Abecker Dipl.-Inform.
Head of Innovation Management
Ludwig-Erhard-Allee 6
76131 Karlsruhe, Germany
www.disy.net
Tel. +49 721 16006-256
Fax +49 721 16006-05
Disy Informationssysteme GmbH
20 2016, May 25th – Geospatial World Forum, Rotterdam (NL)