Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data!...

43
Big Spatial Data Bruce Sanderson

Transcript of Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data!...

Page 1: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Big Spatial DataBruce Sanderson

Page 2: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Z R Y Q PT R Y T OR E A D T H I S

IF YOU CAN THEN

YOU ARE A MUCH BETTER PERSON

THAN I CAN EVER HOPE TO BE IN THIS LIFETIME OR THE NEXT

GOODBYE CRUEL WORLD IT WAS NICE KNOWING YOU BUT NOW IT IS TIME TO GO FOR GOOD

Page 3: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

BIG DATA

Page 4: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

THE PROBLEM

WITH BIG DATA

IS NOT THAT IT’S BIG

Page 5: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

It’s that there is LOTS OF IT

Page 6: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for
Page 7: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for
Page 8: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

IN ORDER TO SEE CLEARLY WE NEED

BETTER TOOLS

THAT BRING THINGS BACK

INTO FOCUS

Page 9: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for
Page 10: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Agenda

• Our Hadoop implementation

• Big Spatial Data workflows

• What should you be doing?

Page 11: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for
Page 12: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

SPATIAL DATA

MANGEMENT WINDOW

TOO SHORT

Page 13: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

UNSTRUCTURED

DATA

Page 14: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Drivers

• Volume of data

• Need to load/process quicker

• Data that doesn’t fit into RDBMS

Page 15: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

How do you get started?

But have Big Data needs

If you’re not a Big Data expert…

Page 16: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Basic Hadoop Stack

Hadoop Distributed File System (HDFS)

MapReduce

Yet Another Resource Negotiater(YARN)

Commodity

Servers

Page 17: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

MapReduce

• MapReduce is a programming model and an

associated implementation for processing

and generating large data sets with a

parallel, distributed algorithm on a cluster.

• Yuck!

Page 18: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Name Node

DataNode 1 DataNode 3DataNode 2

HDFS Client

Secondary Node

Page 19: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Hive

Map Reduce

Hadoop Distributed File System (HDFS)

HBase Impala Spark

Pig Cascading

Sqoop

Flume

Page 20: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

What Next?

Page 21: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for
Page 22: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Load data!

hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt

Page 23: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

• Command line interface for transferring data

between relational databases and HDFS

• Support joins and where clauses

• Quest Data Connector - Oracle

Page 24: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

• SQL on Hadoop

• External tables

• Schema on read

hive> CREATE EXTERNAL TABLE well(id INT,

api STRING, name STRING) ROW FORMAT

DELIMITED FIELDS TERMINATED BY ',' LINES

TERMINATED BY '\n' STORED AS TEXTFILE

LOCATION '/home/admin/userdata';

Page 25: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

• SQL on HDFS

• Bypasses MapReduce

• Most operations in-memory

Page 26: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

• Up to 100x faster than Hadoop MapReduce

• Java, Scala, Python, R

• Access diverse data sources

- HDFS, HBase, Hive, S3

• SparkSQL

Page 27: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

ArcMap

ArcPyHDFS

Snakebite

• Pure Python client

• Allows us to use ArcPy!

Page 28: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Big Spatial Data workflows

•Vehicle Tracking Analysis

•Directional Survey Analysis

•Custom GeoAnalytics Interface

Page 29: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

GPS Data

Vehicle Tracking Analysis

HDFS

ArcMap

ArcPy

put

Page 30: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for
Page 31: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

GeoJson

Directional Survey Analyses

HDFS

Directional

Surveys

Sqoop

FME ArcMapSDE

Page 32: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Wells Leases Rigs

jdbc

Hive Tables

Charts Renderer Tiles

GeoAnalytics Interface

HDFS

User i/f

Page 33: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

What did we learn?

• There is a need for some Big Data tools

• Mostly using Spark directly against HDFS

Page 34: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

What does that mean for you?

Page 35: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

10.4 Release

• Big Data built-into ArcGIS

• Push button deployment

• Any number of nodes

•Spark is part of framework!

Page 36: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

• ‘Real-Time GIS: The Road Ahead’

• Room 14 B, Wed 1:30-2:45pm

Page 37: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for
Page 38: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for
Page 39: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for
Page 40: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

So you no longer need to be a Big Data expert to use it in

your geospatial environment.

Page 41: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Hadoop

Invest in Spark

Learn Java, Scala, or Python

No Hadoop

Wait for 10.4!

Study Spark

Learn Java, Scala, or Python

Page 42: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Summary

• Do you need Big Spatial Data Tools?

- Yes – but probably worth waiting for 10.4

•Spark

Page 43: Big Spatial Data - Esri · HBase Impala Spark Pig Cascading Sqoop Flume. What Next? Load data! hadoop fs -put N:\07_2014_gps.txt \user\vehicle\data.txt •Command line interface for

Stay in touch:

[email protected]