RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex |...

31
RealTime Geospa.al Processing with NVIDIA GPUs, Storm Cluster, and Hyperdex Opera.ng on FusionIO March 19, 2013 Srinivas Reddy, CTO

Transcript of RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex |...

Page 1: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

0 Business Confidential

Real-­‐Time  Geospa.al  Processing  with  NVIDIA  GPUs,  Storm  Cluster,  and  Hyperdex  

Opera.ng  on  FusionIO  

March  19,  2013  

Srinivas  Reddy,  CTO  

Page 2: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

1 Business Confidential

Our ability to collect big data is outweighing the current capabilities to process and perform computational analysis to translate it into essential information.

User communities desire or are obliged to know what is happening now or as close to real-time as possible.

The Problem

Page 3: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

2 Business Confidential

Processing of emitter and sensor data presents a monumental challenge of having to perform possibly billions of calculations for a desired return after matching defined criteria in real-time. To accomplish this solely on CPUs has become time and cost prohibitive.

The Challenge

Page 4: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

3 Business Confidential

•  CPU based systems

•  Cost metrics - $1.2M

•  Time metrics ~ 3.6 minutes

•  Not true real-time processing

Current Architecture

Page 5: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

4 Business Confidential

•  GPU based systems

•  Cost metrics - $100K

•  Time metrics – 9 seconds

•  Near real-time processing

•  Comparison

•  12X reduction in costs

•  24X reduction from receipt to presentation time

•  Algorithm processing time: GPU 72X faster on

Tesla

New Architecture

Page 6: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

5 Business Confidential

GPUs have introduced an increase in processing capacity making desired results attainable, but as data scales in size exponentially a new problem begins to arise: managing immense data in a distributed GPU clustered environment.

A new approach that scales almost infinitely has been created with these vast numbers in mind to address the mission need.

A Novel Approach

Page 7: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

6 Business Confidential

SRIS has developed an innovative solution for meeting this challenge by developing an architecture for processing large amounts of real-time data utilizing Storm, NVIDIA GPUs, and Hyperdex operating on FusionIO. In this scenario, we applied this architecture to the geospatial domain.

Furthermore, a custom platform called MonsterWave was developed to efficiently manage and process data flows to GPU clusters.

Our Solution

Page 8: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

7 Business Confidential

•  Storm Cluster •  Nimbus (1)

•  Zookeeper (3)

•  Supervisor (16)

•  MonsterWave (1)

•  NVIDIA GPUs (9: 2 x K10, 1 x K20, 1 S2050)

•  Hyperdex cluster (3)

•  FusionIO (1)

•  ESRI Tracking Server (1) and ArcMap (1)

Carefully Chosen Components

Page 9: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

8 Business Confidential

Architecture

R720DELL GPU 0:K10 0:K20

RocksHP GPU 0:T20 1:T20 2:T20 3:T20

Hyperdex Cluster FusionIO ioDrive (320GB)

Storm Cluster

ESRI Tracking Server

Streams Streams Streams

ESRI ArcMap

Google Earth

Other Clients

1

2

4

5

6

3

GreenPlum Netezza

MS Sql Server

Page 10: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

9 Business Confidential

•  Open-source, big-data processing system

•  Intended for real-time processing

•  Language independent

•  Complex event-processing system

•  Fault-tolerance and process management

•  Guaranteed message processing

•  Real-Time Metrics: •  ~50,000 messages per second to ~300,000 messages per second

Storm Cluster

Page 11: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

10 Business Confidential

MonsterWave

Job scheduling and management based on GPU

awareness in a distributed GPU clustered

environment.

Page 12: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

11 Business Confidential

•  Job Scheduling and Management based on GPU

Awareness •  Schedules Jobs based on heuristics

•  Manages multi-GPU in a clustered environment

•  Queues data for GP-GPU processing based on real-time GPU status

•  Server •  Send data via socket (.Net, Java, C++); RabbitMQ/ActiveMQ/Kafka

•  Send data to and from Storm

•  GPU Monitoring Capability •  GPU Inventory/GPU Utilization,Memory,Operating Temperatures

MonsterWave

Page 13: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

12 Business Confidential

•  mwhw --activeTxn

•  mwhw --inventory

MonsterWave Visualization

Page 14: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

13 Business Confidential

•  mwhw --issues

•  mwhw --rebalance •  mwhw –refresh •  mwhw --delete •  …

MonsterWave Visualization

Page 15: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

14 Business Confidential

•  Tesla K10 •  4.58 teraflops of single precision floating point

•  2 x 1536 CUDA cores (3072 cores)

•  8GB GDDR5 memory

•  Tesla K20 •  3.52 teraflops of single precision floating point

•  2496 CUDA cores

•  5GB GDDR5 memory

NVIDIA GPUs

Page 16: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

15 Business Confidential

•  A GP-GPU cluster has been used in an operational

testing environment with similar computational

demands

•  The system with 16 NVIDIA M2090 GPUS performed

160 Trillion calculations per GPU per hour (2.560

Quadrillion calculations per hour)

•  Data processing previously taking 8 days has been

reduced to approximately 1 hour on the GPU cluster.

GP-GPU Operational Testing Findings

Page 17: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

16 Business Confidential

A Searchable Distributed Key_Value Store •  Hyperspace hashing: Mapping, not an index

•  Value-dependent chaining: Provides Atomicity, Ordering,

Replication, and Relocation

•  High-Performance: High throughput with low variance

•  Strong Consistency: Strong safety guarantees

•  Fault Tolerance: Tolerates a threshold of failures

•  Scalable: Adding resources increases performance

•  Rich API: Support for complex data structures and search

Hyperdex

Page 18: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

17 Business Confidential

•  Consistent Low Latency Performance

•  Caching and write-heavy databases and

applications

•  Stores Reference data thru Hyperdex

FusionIO

Page 19: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

18 Business Confidential

FusionIO as Swap

•  Before FusionIO Swap

•  After FusionIO Swap

Page 20: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

19 Business Confidential

•  Tracking Server •  Collect and distribute real-time GIS data

•  Filtering and Alerting

•  Based on attribute of the data

•  Geofencing

•  http://www.esri.com/software/arcgis/tracking-server

•  ArcMap •  Visualization

ESRI

Page 21: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

20 Business Confidential

•  Insert ESRI ArcMap <SCREENSHOT>

Real-time Data Visualization

Locations of Sensors and Emitters

Page 22: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

21 Business Confidential

•  Insert ESRI ArcMap <SCREENSHOT>

Real-time Data Visualization

Cold Temp: Emitters at -12C (10.4F) and 80% Humidity

Page 23: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

22 Business Confidential

•  Insert ESRI ArcMap <SCREENSHOT>

Real-time Data Visualization

Warm Temp: Emitters at 28C(82.4F) and 80% Humidity

Page 24: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

23 Business Confidential

•  Insert ESRI ArcMap <SCREENSHOT>

Real-time Data Visualization

Composite of Emitters at Cold and Warm Temp

Page 25: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

24 Business Confidential

Lessons Learned

•  Pick the correct hardware •  Topology matters in Storm •  FusionIO cache functionality requires min kernel 3.6

Page 26: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

25 Business Confidential

Questions

http://www.sriscompany.com

Page 27: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

26 Business Confidential

  The Dell PowerEdge R720   Dell’s latest 2-socket, 2U rack servers that is designed to run complex workloads using

highly scalable memory, I/O capacity, and flexible network options. The R720 can readily handle very demanding workloads spanning multiple domains, such as data warehousing, e-commerce, virtual desktop infrastructure (VDI), and high performance computing (HPC) as a data node.

  NVIDIA Gemini PCI X3.0   A new graphics architecture that will replace the Fermi architecture. The new architecture

is expected to have significantly increased computational efficiency and 3-4 times higher double-precision floating point performance-per-watt. (Research indicates Performance of over 6TFLOPS Single Precision per GP-GPU)

  FusionIO ioDrive2 Duo   Integrated within the server to offer advanced performance and scalability across

application and databases with minimized latencies. For example, a large graph (approximately 6,871,900,000 nodes) can be processed and stored on a single system. Scalable to utilize FusionIO’s direct-attached high performance ioMemory technology. The Fusion ioDrives provide low latency access to graph nodes and edges, enabling a unique alternative for data intensive computing.

Backup Slide: Hardware

Page 28: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

27 Business Confidential

•  Steps: •  Install dependencies on Nimbus and worker

machines •  Download and extract a Storm release to

Nimbus and worker machines •  Fill in mandatory configurations into Storm

yaml •  Launch daemons under supervision using

“Storm” script and a supervisor of your choice

Storm Cluster Setup

Page 29: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

28 Business Confidential

GP-GPUs with FusionIO’s IOMemory Device Hyperdex (searchable distributed key-value

store for Reference data) CUDA/C++ Algorithms for processing emitter

and sensor data

MonsterWave Platform Server Storm (distributed fault-tolerant and real-time computational system to guarantee delivery of

data) used for ETL

MW Schedules Jobs and queues data for GP-GPU Processing

Client Sends Real-time data to MonsterWave Platform

via Socket (C++/C#/Java) via Thrift Client via RabbitMQ/ActiveMQ

MonsterWave  Design  

Page 30: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

29 Business Confidential

• Nimbus – Master Node

•  Set up a Zookeeper cluster

• Worker nodes – Supervisor daemons

Storm Cluster

Page 31: RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex | …on-demand.gputechconf.com/.../S3305-RT-Geospatial-Processing-GP… · via Socket (C++/C#/Java) via Thrift Client via

30 Business Confidential

Storm Topology