RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex |...

0 Business Confidential

Real-‐Time Geospa.al Processing with NVIDIA GPUs, Storm Cluster, and Hyperdex

Opera.ng on FusionIO

March 19, 2013

Srinivas Reddy, CTO


Our ability to collect big data is outweighing the current capabilities to process and perform computational analysis to translate it into essential information.

User communities desire or are obliged to know what is happening now or as close to real-time as possible.

The Problem


Processing of emitter and sensor data presents a monumental challenge of having to perform possibly billions of calculations for a desired return after matching defined criteria in real-time. To accomplish this solely on CPUs has become time and cost prohibitive.

The Challenge


•  CPU based systems

•  Cost metrics - $1.2M

•  Time metrics ~ 3.6 minutes

•  Not true real-time processing

Current Architecture


•  GPU based systems

•  Cost metrics - $100K

•  Time metrics – 9 seconds

•  Near real-time processing

•  Comparison

•  12X reduction in costs

•  24X reduction from receipt to presentation time

•  Algorithm processing time: GPU 72X faster on

Tesla

New Architecture


GPUs have introduced an increase in processing capacity making desired results attainable, but as data scales in size exponentially a new problem begins to arise: managing immense data in a distributed GPU clustered environment.

A new approach that scales almost infinitely has been created with these vast numbers in mind to address the mission need.

A Novel Approach


SRIS has developed an innovative solution for meeting this challenge by developing an architecture for processing large amounts of real-time data utilizing Storm, NVIDIA GPUs, and Hyperdex operating on FusionIO. In this scenario, we applied this architecture to the geospatial domain.

Furthermore, a custom platform called MonsterWave was developed to efficiently manage and process data flows to GPU clusters.

Our Solution


•  Storm Cluster •  Nimbus (1)

•  Zookeeper (3)

•  Supervisor (16)

•  MonsterWave (1)

•  NVIDIA GPUs (9: 2 x K10, 1 x K20, 1 S2050)

•  Hyperdex cluster (3)

•  FusionIO (1)

•  ESRI Tracking Server (1) and ArcMap (1)

Carefully Chosen Components


Architecture

R720DELL GPU 0:K10 0:K20

RocksHP GPU 0:T20 1:T20 2:T20 3:T20

Hyperdex Cluster FusionIO ioDrive (320GB)

Storm Cluster

ESRI Tracking Server

Streams Streams Streams

ESRI ArcMap

Google Earth

Other Clients

1

2

4

5

6

3

GreenPlum Netezza

MS Sql Server


•  Open-source, big-data processing system

•  Intended for real-time processing

•  Language independent

•  Complex event-processing system

•  Fault-tolerance and process management

•  Guaranteed message processing

•  Real-Time Metrics: •  ~50,000 messages per second to ~300,000 messages per second

Storm Cluster


MonsterWave

Job scheduling and management based on GPU

awareness in a distributed GPU clustered

environment.


•  Job Scheduling and Management based on GPU

Awareness •  Schedules Jobs based on heuristics

•  Manages multi-GPU in a clustered environment

•  Queues data for GP-GPU processing based on real-time GPU status

•  Server •  Send data via socket (.Net, Java, C++); RabbitMQ/ActiveMQ/Kafka

•  Send data to and from Storm

•  GPU Monitoring Capability •  GPU Inventory/GPU Utilization,Memory,Operating Temperatures

MonsterWave


•  mwhw --activeTxn

•  mwhw --inventory

MonsterWave Visualization


•  mwhw --issues

•  mwhw --rebalance •  mwhw –refresh •  mwhw --delete •  …

MonsterWave Visualization


•  Tesla K10 •  4.58 teraflops of single precision floating point

•  2 x 1536 CUDA cores (3072 cores)

•  8GB GDDR5 memory

•  Tesla K20 •  3.52 teraflops of single precision floating point

•  2496 CUDA cores

•  5GB GDDR5 memory

NVIDIA GPUs


•  A GP-GPU cluster has been used in an operational

testing environment with similar computational

demands

•  The system with 16 NVIDIA M2090 GPUS performed

160 Trillion calculations per GPU per hour (2.560

Quadrillion calculations per hour)

•  Data processing previously taking 8 days has been

reduced to approximately 1 hour on the GPU cluster.

GP-GPU Operational Testing Findings


A Searchable Distributed Key_Value Store •  Hyperspace hashing: Mapping, not an index

•  Value-dependent chaining: Provides Atomicity, Ordering,

Replication, and Relocation

•  High-Performance: High throughput with low variance

•  Strong Consistency: Strong safety guarantees

•  Fault Tolerance: Tolerates a threshold of failures

•  Scalable: Adding resources increases performance

•  Rich API: Support for complex data structures and search

Hyperdex


•  Consistent Low Latency Performance

•  Caching and write-heavy databases and

applications

•  Stores Reference data thru Hyperdex

FusionIO


FusionIO as Swap

•  Before FusionIO Swap

•  After FusionIO Swap


•  Tracking Server •  Collect and distribute real-time GIS data

•  Filtering and Alerting

•  Based on attribute of the data

•  Geofencing

•  http://www.esri.com/software/arcgis/tracking-server

•  ArcMap •  Visualization

ESRI


•  Insert ESRI ArcMap <SCREENSHOT>

Real-time Data Visualization

Locations of Sensors and Emitters




Cold Temp: Emitters at -12C (10.4F) and 80% Humidity




Warm Temp: Emitters at 28C(82.4F) and 80% Humidity




Composite of Emitters at Cold and Warm Temp


Lessons Learned

•  Pick the correct hardware •  Topology matters in Storm •  FusionIO cache functionality requires min kernel 3.6


Questions

http://www.sriscompany.com


  The Dell PowerEdge R720   Dell’s latest 2-socket, 2U rack servers that is designed to run complex workloads using

highly scalable memory, I/O capacity, and flexible network options. The R720 can readily handle very demanding workloads spanning multiple domains, such as data warehousing, e-commerce, virtual desktop infrastructure (VDI), and high performance computing (HPC) as a data node.

  NVIDIA Gemini PCI X3.0   A new graphics architecture that will replace the Fermi architecture. The new architecture

is expected to have significantly increased computational efficiency and 3-4 times higher double-precision floating point performance-per-watt. (Research indicates Performance of over 6TFLOPS Single Precision per GP-GPU)

  FusionIO ioDrive2 Duo   Integrated within the server to offer advanced performance and scalability across

application and databases with minimized latencies. For example, a large graph (approximately 6,871,900,000 nodes) can be processed and stored on a single system. Scalable to utilize FusionIO’s direct-attached high performance ioMemory technology. The Fusion ioDrives provide low latency access to graph nodes and edges, enabling a unique alternative for data intensive computing.

Backup Slide: Hardware


•  Steps: •  Install dependencies on Nimbus and worker

machines •  Download and extract a Storm release to

Nimbus and worker machines •  Fill in mandatory configurations into Storm

yaml •  Launch daemons under supervision using

“Storm” script and a supervisor of your choice

Storm Cluster Setup


GP-GPUs with FusionIO’s IOMemory Device Hyperdex (searchable distributed key-value

store for Reference data) CUDA/C++ Algorithms for processing emitter

and sensor data

MonsterWave Platform Server Storm (distributed fault-tolerant and real-time computational system to guarantee delivery of

data) used for ETL

MW Schedules Jobs and queues data for GP-GPU Processing

Client Sends Real-time data to MonsterWave Platform

via Socket (C++/C#/Java) via Thrift Client via RabbitMQ/ActiveMQ

MonsterWave Design


• Nimbus – Master Node

•  Set up a Zookeeper cluster

• Worker nodes – Supervisor daemons

Storm Cluster


Storm Topology

RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex |...

Documents

Transcript of RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex |...