Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Click here to load reader

download Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

of 70

  • date post

    10-Jul-2015
  • Category

    Technology

  • view

    434
  • download

    6

Embed Size (px)

description

What got you hooked on geospatial? For me it was more than just maps – it was the ability to transform geographic data to see something new or shed light on some aspect of my environment. Whether you use GDAL, ArcGIS, GRASS or IDRISI, we have usually done this type of data transformation with a variety of desktop software tools. So why have these types of capabilities been relatively rare in web and mobile applications? Speed and scalability are two important factors. It has generally required too much time to calculate a viewshed, combine a stack of raster files into a weighted overlay, or generate slope and aspect from elevation data. Azavea has been working on this problem – fast, scalable geoprocessing – for several years. In 2012 we released a new open source project called GeoTrellis (http://geotrellis.io/), an open source framework for fast, distributed geoprocessing. GeoTrellis leverages the strong type system and functional programming style of the Scala language and the Spark and Akka frameworks. This talk will give an overview of GeoTrellis and how it can be integrated with web mapping tools to create online geoprocessing applications for stormwater modeling, education games, infrastructure prioritization, climate change, and transportation.

Transcript of Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

Distributed Computing

@azavea@rcheetham21st Century Geoprocessing with Scala and GeoTrellis

Robert Cheethamcheetham@azavea.com

1B CorporationCivic/Social impactDonate share of profits

Research-Driven10% Research ProgramAcademic CollaborationsOpen SourceOpen Data

22Use geodata to do stuff that matters3Land

Water

People

4

Ian McHarg5He wrote a book in 1969 called Design with Nature, and focused on sustainable and ecological design. Among other concepts, he described how a series of inputs drawn on transparent acetate sheets. Could be combined as a set of map overlays to identify the best site for a particular facility, road or whatever.

Dana Tomlin6

Idrisi7GRASS

8

advanced spatial analysis on the web9advanced spatial analysis on the webStuff on a map103 Challenges11

1. Performance & Scalability12Big Data Cities

2. Large Data Sets Digital City1313

2. Large Data Sets Social Media1414

2. Large Data Sets - Science1515

3. User Interface16

3. User Interface17

3. User Interface18

3. User Interface19

3. User Interface20We can do better21

IO

Geoprocessing Operations

Distributed Processing

Web ServicesReal-time Processing

24

6183 x 49924598 x 4867 118 MB 86 MBCluster-style Processing

26

1770271 x 9101395.8 TB

How does it work

28

On the shoulders of giants

LocationTech Community

30

Some changes coming32

Parallel operations across tilesParallel execution of operationsBasic cluster capabilities with GeoTrellis v0.9: +

Sharding raster data across the clusterCaching operation results across clusterHDFS supportAdvanced Fault toleranceAdvanced Scheduling...

What's missing?

+

Caches results in memoryIdeal for iterative algorithmsSignificantly outperforms HadoopUses Hadoop's file system (HDFS)

+What becomes possible?37

Urban Forests386. Spring 2010 - Urban Forest Map-> Huge bummer

Urban Forests3911. USDA iTree - ecosystem benefits-> Not only creating tree inventory system but also calculating ecosystem benefits and financial benefits for trees-> All we need to is tree species and diameter

Simulation Modeling4016. USDA SBIR - prioritization, modeling and simulation - finished a prototype

Sea Level Rise41

Business Siting42Heat map

Streaming Data43

Counting Carbon44

Digital Humanities45GeoTrellis Transit

46

Travelsheds47

Crime Analysis and Forecasting48Its the second Monday in October and school is in session. There were 2 burglaries and 3 assaults yesterday. The Maple Leafs are not playing this evening. Six bars, three take-out stores, and a high school are in the neighborhood. The forecast is 9C with a 50% chance of rain this evening.

Where do you focus your 3 vehicles?Its the second Monday in October and school is in session. There were 2 burglaries and 3 assaults yesterday. The Maples Leafs are not playing this evening. Six bars, three take-out stores, and a high school are in the neighborhood. The forecast is 9C with 50% chance of rain.

Where do you focus your 3 vehicles?

Data Science + Geography51

Data Science + Geography52Faster is different 53

Educational Games54

New Devices and Displays55

I am very excited58advanced spatial analysis on the web59

advanced spatial analysis on the web60Simulation, modeling and forecastingLand

Water

People

61Simulation

Modeling

Forecasting

62Multi-bandTemporal bands (climate)More operationsTile indexesGeoMesa collab.Simpler setupMore integration points

Whats next?6363

64GeoTrellis.io

Get Involved

65Get Involved

geotrellis-user@googlegroups.com

66Get Involved

IRC: #geotrellis on freenode

67Use geodata to do stuff that matters68

69jobs.azavea.com

cheetham@azavea.com

@rcheetham

[is hiring]

70

71