Geospatial Web Services Introduction to Geospatial Web Services
RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex |...
Transcript of RT Geospatial Processing with NVIDIA GPUs, Storm, HyperDex |...
0 Business Confidential
Real-‐Time Geospa.al Processing with NVIDIA GPUs, Storm Cluster, and Hyperdex
Opera.ng on FusionIO
March 19, 2013
Srinivas Reddy, CTO
1 Business Confidential
Our ability to collect big data is outweighing the current capabilities to process and perform computational analysis to translate it into essential information.
User communities desire or are obliged to know what is happening now or as close to real-time as possible.
The Problem
2 Business Confidential
Processing of emitter and sensor data presents a monumental challenge of having to perform possibly billions of calculations for a desired return after matching defined criteria in real-time. To accomplish this solely on CPUs has become time and cost prohibitive.
The Challenge
3 Business Confidential
• CPU based systems
• Cost metrics - $1.2M
• Time metrics ~ 3.6 minutes
• Not true real-time processing
Current Architecture
4 Business Confidential
• GPU based systems
• Cost metrics - $100K
• Time metrics – 9 seconds
• Near real-time processing
• Comparison
• 12X reduction in costs
• 24X reduction from receipt to presentation time
• Algorithm processing time: GPU 72X faster on
Tesla
New Architecture
5 Business Confidential
GPUs have introduced an increase in processing capacity making desired results attainable, but as data scales in size exponentially a new problem begins to arise: managing immense data in a distributed GPU clustered environment.
A new approach that scales almost infinitely has been created with these vast numbers in mind to address the mission need.
A Novel Approach
6 Business Confidential
SRIS has developed an innovative solution for meeting this challenge by developing an architecture for processing large amounts of real-time data utilizing Storm, NVIDIA GPUs, and Hyperdex operating on FusionIO. In this scenario, we applied this architecture to the geospatial domain.
Furthermore, a custom platform called MonsterWave was developed to efficiently manage and process data flows to GPU clusters.
Our Solution
7 Business Confidential
• Storm Cluster • Nimbus (1)
• Zookeeper (3)
• Supervisor (16)
• MonsterWave (1)
• NVIDIA GPUs (9: 2 x K10, 1 x K20, 1 S2050)
• Hyperdex cluster (3)
• FusionIO (1)
• ESRI Tracking Server (1) and ArcMap (1)
Carefully Chosen Components
8 Business Confidential
Architecture
R720DELL GPU 0:K10 0:K20
RocksHP GPU 0:T20 1:T20 2:T20 3:T20
Hyperdex Cluster FusionIO ioDrive (320GB)
Storm Cluster
ESRI Tracking Server
Streams Streams Streams
ESRI ArcMap
Google Earth
Other Clients
1
2
4
5
6
3
GreenPlum Netezza
MS Sql Server
9 Business Confidential
• Open-source, big-data processing system
• Intended for real-time processing
• Language independent
• Complex event-processing system
• Fault-tolerance and process management
• Guaranteed message processing
• Real-Time Metrics: • ~50,000 messages per second to ~300,000 messages per second
Storm Cluster
10 Business Confidential
MonsterWave
Job scheduling and management based on GPU
awareness in a distributed GPU clustered
environment.
11 Business Confidential
• Job Scheduling and Management based on GPU
Awareness • Schedules Jobs based on heuristics
• Manages multi-GPU in a clustered environment
• Queues data for GP-GPU processing based on real-time GPU status
• Server • Send data via socket (.Net, Java, C++); RabbitMQ/ActiveMQ/Kafka
• Send data to and from Storm
• GPU Monitoring Capability • GPU Inventory/GPU Utilization,Memory,Operating Temperatures
MonsterWave
12 Business Confidential
• mwhw --activeTxn
• mwhw --inventory
MonsterWave Visualization
13 Business Confidential
• mwhw --issues
• mwhw --rebalance • mwhw –refresh • mwhw --delete • …
MonsterWave Visualization
14 Business Confidential
• Tesla K10 • 4.58 teraflops of single precision floating point
• 2 x 1536 CUDA cores (3072 cores)
• 8GB GDDR5 memory
• Tesla K20 • 3.52 teraflops of single precision floating point
• 2496 CUDA cores
• 5GB GDDR5 memory
NVIDIA GPUs
15 Business Confidential
• A GP-GPU cluster has been used in an operational
testing environment with similar computational
demands
• The system with 16 NVIDIA M2090 GPUS performed
160 Trillion calculations per GPU per hour (2.560
Quadrillion calculations per hour)
• Data processing previously taking 8 days has been
reduced to approximately 1 hour on the GPU cluster.
GP-GPU Operational Testing Findings
16 Business Confidential
A Searchable Distributed Key_Value Store • Hyperspace hashing: Mapping, not an index
• Value-dependent chaining: Provides Atomicity, Ordering,
Replication, and Relocation
• High-Performance: High throughput with low variance
• Strong Consistency: Strong safety guarantees
• Fault Tolerance: Tolerates a threshold of failures
• Scalable: Adding resources increases performance
• Rich API: Support for complex data structures and search
Hyperdex
17 Business Confidential
• Consistent Low Latency Performance
• Caching and write-heavy databases and
applications
• Stores Reference data thru Hyperdex
FusionIO
18 Business Confidential
FusionIO as Swap
• Before FusionIO Swap
• After FusionIO Swap
19 Business Confidential
• Tracking Server • Collect and distribute real-time GIS data
• Filtering and Alerting
• Based on attribute of the data
• Geofencing
• http://www.esri.com/software/arcgis/tracking-server
• ArcMap • Visualization
ESRI
20 Business Confidential
• Insert ESRI ArcMap <SCREENSHOT>
Real-time Data Visualization
Locations of Sensors and Emitters
21 Business Confidential
• Insert ESRI ArcMap <SCREENSHOT>
Real-time Data Visualization
Cold Temp: Emitters at -12C (10.4F) and 80% Humidity
22 Business Confidential
• Insert ESRI ArcMap <SCREENSHOT>
Real-time Data Visualization
Warm Temp: Emitters at 28C(82.4F) and 80% Humidity
23 Business Confidential
• Insert ESRI ArcMap <SCREENSHOT>
Real-time Data Visualization
Composite of Emitters at Cold and Warm Temp
24 Business Confidential
Lessons Learned
• Pick the correct hardware • Topology matters in Storm • FusionIO cache functionality requires min kernel 3.6
25 Business Confidential
Questions
http://www.sriscompany.com
26 Business Confidential
The Dell PowerEdge R720 Dell’s latest 2-socket, 2U rack servers that is designed to run complex workloads using
highly scalable memory, I/O capacity, and flexible network options. The R720 can readily handle very demanding workloads spanning multiple domains, such as data warehousing, e-commerce, virtual desktop infrastructure (VDI), and high performance computing (HPC) as a data node.
NVIDIA Gemini PCI X3.0 A new graphics architecture that will replace the Fermi architecture. The new architecture
is expected to have significantly increased computational efficiency and 3-4 times higher double-precision floating point performance-per-watt. (Research indicates Performance of over 6TFLOPS Single Precision per GP-GPU)
FusionIO ioDrive2 Duo Integrated within the server to offer advanced performance and scalability across
application and databases with minimized latencies. For example, a large graph (approximately 6,871,900,000 nodes) can be processed and stored on a single system. Scalable to utilize FusionIO’s direct-attached high performance ioMemory technology. The Fusion ioDrives provide low latency access to graph nodes and edges, enabling a unique alternative for data intensive computing.
Backup Slide: Hardware
27 Business Confidential
• Steps: • Install dependencies on Nimbus and worker
machines • Download and extract a Storm release to
Nimbus and worker machines • Fill in mandatory configurations into Storm
yaml • Launch daemons under supervision using
“Storm” script and a supervisor of your choice
Storm Cluster Setup
28 Business Confidential
GP-GPUs with FusionIO’s IOMemory Device Hyperdex (searchable distributed key-value
store for Reference data) CUDA/C++ Algorithms for processing emitter
and sensor data
MonsterWave Platform Server Storm (distributed fault-tolerant and real-time computational system to guarantee delivery of
data) used for ETL
MW Schedules Jobs and queues data for GP-GPU Processing
Client Sends Real-time data to MonsterWave Platform
via Socket (C++/C#/Java) via Thrift Client via RabbitMQ/ActiveMQ
MonsterWave Design
29 Business Confidential
• Nimbus – Master Node
• Set up a Zookeeper cluster
• Worker nodes – Supervisor daemons
Storm Cluster
30 Business Confidential
Storm Topology