Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

70
@azavea @rcheetham 21 st Century Geoprocessing with Scala and GeoTrellis Robert Cheetham [email protected]

description

What got you hooked on geospatial? For me it was more than just maps – it was the ability to transform geographic data to see something new or shed light on some aspect of my environment. Whether you use GDAL, ArcGIS, GRASS or IDRISI, we have usually done this type of data transformation with a variety of desktop software tools. So why have these types of capabilities been relatively rare in web and mobile applications? Speed and scalability are two important factors. It has generally required too much time to calculate a viewshed, combine a stack of raster files into a weighted overlay, or generate slope and aspect from elevation data. Azavea has been working on this problem – fast, scalable geoprocessing – for several years. In 2012 we released a new open source project called GeoTrellis (http://geotrellis.io/), an open source framework for fast, distributed geoprocessing. GeoTrellis leverages the strong type system and functional programming style of the Scala language and the Spark and Akka frameworks. This talk will give an overview of GeoTrellis and how it can be integrated with web mapping tools to create online geoprocessing applications for stormwater modeling, education games, infrastructure prioritization, climate change, and transportation.

Transcript of Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Page 1: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

@azavea

@rcheetham

21st Century Geoprocessing

with Scala and GeoTrellis

Robert Cheetham

[email protected]

Page 2: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

B Corporation

• Civic/Social impact

• Donate share of profits

Research-Driven

• 10% Research Program

• Academic Collaborations

• Open Source

• Open Data

Page 3: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Use geodata to

do stuff that matters

Page 4: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Land

Water

People

Page 5: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Ian McHarg

Page 6: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Dana Tomlin

Page 7: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Idrisi

Page 8: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

GRASS

Page 9: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

advanced

spatial analysis

on the web

Page 10: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

advanced

spatial analysis

on the web

Page 11: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

3 Challenges

Page 12: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

1. Performance & Scalability

Page 13: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Big Data – Cities

2. Large Data Sets – Digital City

Page 14: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

2. Large Data Sets – Social Media

Page 15: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

2. Large Data Sets - Science

Page 16: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

3. User Interface

Page 17: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

3. User Interface

Page 18: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

3. User Interface

Page 19: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

3. User Interface

Page 20: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

3. User Interface

Page 21: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

We can do better

Page 22: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis
Page 23: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

• IO

• Geoprocessing Operations

• Distributed Processing

• Web Services

Page 24: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Real-time Processing

Page 25: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

6183 x 4992 4598 x 4867

118 MB 86 MB

Page 26: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Cluster-style Processing

Page 27: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

1770271 x 910139

5.8 TB

Page 28: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

How does it work

Page 29: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

On the shoulders of giants

Page 30: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

LocationTech Community

Page 31: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis
Page 32: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Some changes coming

Page 33: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

• Parallel operations across tiles

• Parallel execution of operations

• Basic cluster capabilities with

GeoTrellis v0.9:

+

Page 34: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

• Sharding raster data across the cluster

• Caching operation results across cluster

• HDFS support

• Advanced Fault tolerance

• Advanced Scheduling

• ...

What's missing?

+

Page 35: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

• Caches results in memory

• Ideal for iterative algorithms

• Significantly outperforms Hadoop

• Uses Hadoop's file system (HDFS)

Page 36: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

+

Page 37: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

What becomes possible?

Page 38: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Urban Forests

Page 39: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Urban Forests

Page 40: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Simulation Modeling

Page 41: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Sea Level Rise

Page 42: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Business Siting

Page 43: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Streaming Data

Page 44: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Counting Carbon

Page 45: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Digital Humanities

Page 46: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

GeoTrellis Transit

Page 47: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Travelsheds

Page 48: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Crime Analysis and Forecasting

Page 49: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

It’s the second Monday in October

and school is in session. There were 2

burglaries and 3 assaults yesterday.

The Maple Leafs are not playing this

evening. Six bars, three take-out

stores, and a high school are in the

neighborhood. The forecast is 9°C

with a 50% chance of rain this evening.

Where do you focus your 3 vehicles?

Page 50: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

It’s the second Monday in October

and school is in session. There were 2

burglaries and 3 assaults yesterday.

The Maples Leafs are not playing this

evening. Six bars, three take-out stores,

and a high school are in the

neighborhood. The forecast is 9°C

with 50% chance of rain.

Where do you focus your 3 vehicles?

Page 51: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Data Science + Geography

Page 52: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Data Science + Geography

Page 53: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Faster is different…

Page 54: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Educational Games

Page 55: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

New Devices and Displays

Page 56: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis
Page 57: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis
Page 58: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

I am very excited

Page 59: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

advanced

spatial analysis

on the web

Page 60: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

advanced

spatial analysis

on the web

Page 61: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Land

Water

People

Page 62: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Simulation

Modeling

Forecasting

Page 63: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

• Multi-band

• Temporal bands (climate)

• More operations

• Tile indexes

• GeoMesa collab.

• Simpler setup

• More integration points

What’s next?

Page 64: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis
Page 65: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

GeoTrellis.io

Get Involved

Page 66: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Get Involved

[email protected]

Page 67: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Get Involved

IRC: #geotrellis on freenode

Page 68: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Use geodata to

do stuff that matters

Page 69: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis
Page 70: Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

jobs.azavea.com

[email protected]

@rcheetham

[is hiring]