GeoMesa: Scalable Geospatial Analytics
-
Upload
visiongeomatique2014 -
Category
Technology
-
view
356 -
download
4
description
Transcript of GeoMesa: Scalable Geospatial Analytics
GeoMesa: Scalable Geospatial Analytics
Chris [email protected]
terms
• GeoMesa: an open-source project organized under LocationTech
• scalable: if you can continue to solve problems as N >> 1 with no more change than
adding hardware and minor tweaks, you scale
• geospatial: data that contain a geographic reference, a date/time, and zero
or more additional attributes
• analytics: formally, a logical decomposition via truth-preserving transformations;
informally, any useful derivation (whether deductive or inductive)
outline
• part 1: why? ( 3 minutes)
• part 2: how? (10 minutes)
• part 3: what? (10 minutes)
• part 4: who? ( 2 minutes)
part 1: why?
[why] which X (points) are close to location Y?
• hundreds: PostgreSQL and brute force
– full table scan
• hundreds of thousands: PostgreSQL and PostGIS
– GeoTools API
– GiST (think R-trees)
• hundreds of millions: a funny thing happens as you collect much more data...
[why] dissolution of large-volume data
[why] perhaps SQL is the bottleneck?
• NoSQL databases, such as Apache Accumulo
• trade ACID for distributed processing, storage
• but there’s no PostGIS for Accumulo, so how does the canonical diagram of an Accumulo (key,
value) pair help us answer some simple questions...
[why] questions that ought to be easy for an index to answer
• easy question: Which comes first, “Ontario” or “Quebec”?
[why] questions that ought to be easy for an index to answer
• easy question: Which comes first, “Ontario” or “Quebec”?
• similar question: Which comes first, or ?
[why] questions that ought to be easy for an index to answer
• easy question: Which comes first, “Ontario” or “Quebec”?
• similar question: Which comes first, or ?
• simplify, and think only of representative cities, and think of them strictly as points
[why] geohashing
[why] geohashing
[why] geohashing
City Coordinates (courtesy Wikipedia) Geohash
Ottawa 45°25′15″N 75°41′24″W f244m
Montréal 45°30′N 73°34′W f25dv
Charlottesville (Virginia, USA) 38°1′48″N 78°28′44″W dqb0q
● Two unique orders:
○ Order by name: Charlottesville, Montréal, Ottawa
○ Order by longitude or latitude or geohash: Charlottesville, Ottawa, Montréal
● Lexicoding location -> geohash provides a deterministic, repeatable ordering
○ with this, we can index, store, and query points by lexicographic ranges
[why] build-versus-buy remorse
• PostgreSQL+PostGIS has some nice functions
– geometric predicates
– secondary indexes
– standard GeoTools API
• some of our data are (multi) lines, (multi) polygons
• time is often more than a secondary consideration
• sometimes, analysis work needn’t be done on the same old client
– distributed across the tablet servers?
– using tools like Spark?
– streaming?
[why] synthesis
part 2: how?
[how] GeoMesa features
• GeoTools API
• sharding distributes queries uniformly
• flexible SFC can incorporate time
• supports (multi) point, (multi) line, (multi) polygon geometries
• secondary indexes and a multi-stage query planner
• burgeoning raster support via WCS
• GeoServer as a plugin-based GUI
• WPS standards for computation (and function chaining)
[how] GeoTools API
[how] sharding
[how] space-filling curve progression
%~#s%3#r%0,3#gh%yyyyMM#d::%~#s%3,2#gh::%~#s%5,2#gh%HHmm#d%id
[how] multi-step query planning
[how] multi-step query planning
[how] non-point geometries
[how] rasters + GeoWave integration
[how] supporting other frameworks
[how] GeoServer as a plug-in GUI
[how] Web Processing Service
• WPS is another OGC standard
• Think of it as an abstract function definition, mapping input types to output types, and defining
the computation that occurs between the two.
• WPS processes can be chained.
• This provides for a natural extension mechanism to GeoMesa.
[how] synthesis
Those are merely the highlights of some of GeoMesa’s current features…
… so what?
part 3: what?
[what] distributing computation
[what] queries that interpolate both position and time
[what] K-nearest neighbor
[what] clustering (DBSCAN)
[what] near-real-time streaming track analytics with web sockets
[what] track viewer utility
part 3: who?
[who] LocationTech and the greater community
[who] synthesis
questions
For extended questions:
For additional reading:
geomesa.org
For code:
github.com/locationtech/geomesa