High Performance Hardware for Data Analysis

22
High Performance Hardware for Data Analysis NYC 2014 Mike Pittaro [email protected] [email protected] @pmikeyp www.slideshare.net/lhrc_mikeyp / github.com/lhrc-mikeyp

description

Presentation from PyData NYC 2014 There is a video available on the PyData YouTube channel https://www.youtube.com/watch?v=mx0j7uBdD8k

Transcript of High Performance Hardware for Data Analysis

Page 1: High Performance Hardware for Data Analysis

High Performance Hardware for Data Analysis

NYC 2014

Mike Pittaro [email protected]

[email protected]

@pmikeyp

www.slideshare.net/lhrc_mikeyp/

github.com/lhrc-mikeyp

Page 2: High Performance Hardware for Data Analysis

2

About This Talk

• We can’t cover everything about hardware in a 40 minute session.

• We can go deep enough to help you – Understand tradeoffs and balanced architectures

– Ask the right questions about choices

– Learn from what others are doing

• My Approach Today1. Why look at high performance hardware ?

2. Look at a production cluster design

3. Look at the choices and tradeoffs behind the scene

About me:

• Principal Architect for Big Data Solutions at Dell– Part of the Enterprise Solutions Group in Dell engineering

– I develop clusters and reference architectures for our customers

– Background in Supercomputing, Data Warehousing and Data Integration

– Python is my preferred programming language

Page 3: High Performance Hardware for Data Analysis

3

Why consider High Performance Hardware ?

• Choice of hardware can have large impacts

– On performance

– On budget

• Understanding the hardware helps with the software

– Scalable and parallel systems deal with both

• Cloud hosting may not be an option

– You can’t or won’t delegate critical infrastructure to third parties.

– Operating costs are too high

– You need every bit of performance you can get.

• Data is heavy

– Local clusters are persistent

– Large data transfer may not me a viable option.

Page 4: High Performance Hardware for Data Analysis

4

• Customers want results– Performance

– Predictability

– Reliability

– Cost Management

– Proven Solution

– Tested Configuration

• There are many options– Servers

– Processors, GPU’s

– Drives

– Networking

– Jargon and Acronyms

– Lack of Solid Information

The Problem with Big Data Hardware

4

Page 5: High Performance Hardware for Data Analysis

5

• Tested Server Configurations

• Tested Network Configurations

• Base Software Configuration– Big Data Software

– OS Infrastructure

– Operational Infrastructure

• Predefined configuration– Recommended starting point

– Customization is possible

The secret to a good architecture is balance

PricePerformanceFault Zones

Application Workload and Software

A Reference Architecture Fills The Gap

5

Page 6: High Performance Hardware for Data Analysis

6

Cluster Architecture

• The Dell In-Memory Appliance for Cloudera

6

Page 7: High Performance Hardware for Data Analysis

7

Dell In-Memory Appliance – Summary Specs

Cluster Starter Mid-Size Small Enterprise Maximum

Data Nodes 4 12 20 44

Total Memory 1536 GB 4608 GB 7680 GB 26896 GB

Total Storage 176TB 528 TB 880 TB 2112 TB

Processing Cores 80 280 400 880

Racks (42U) 1 2 2 4

Data Node Characteristic ConfigurationServer Dell R720xd (2 Rack Units)

Processor Two Intel Xeon E5-2670v2 2.5GHz,25M Cache, 10 Core

Memory 384GB

Memory Speed 1866 Mt/s DRAM

Disks 12 X 4TB SATA, 3.0 Gbps (48 TB)

Networking Dual 10GbE interfaces

Active/Active BondingManagement Network

Two x 1GbE interfaces

Page 8: High Performance Hardware for Data Analysis

8

Server Examples

• M1000e Blade Chassis (10U)

4 Socket R920 (4U)

2 Socket R730xd (2U)

Page 9: High Performance Hardware for Data Analysis

9

Server Choices

• 4 Socket Servers (e.g. Dell R920)

– Optimized for enterprise applications

› Large RDBMS servers, SAP, SAP HANA, Microsoft Exchange

– Very large memory available (6 TB)

– Often use direct or network attached storage

• ‘Blade’ Servers (e.g. Dell M620, M1000e Chassis)

– Pluggable Processor and Storage modules

– Backplane and Chassis has a lot of shared interconnect logic

– Flexibility for enterprise applications

› Virtualization is popular

• 2 Socket Servers (e.g. Dell R620, R630, R720, R730)

– Many options available

– 1U and 2U chassis footprints

– Developed for Web Hosting and Large Scale-Out Clusters

– Dell Internal Storage – 12 x 3.5” drives, 24 x 2.5” drives (in chassis)

Page 10: High Performance Hardware for Data Analysis

10

Intel Dual Socket Architecture (Grantley)

Haswell CPU Up to 18 coresTDP: Up to 145 W (SVR); 160 W (WS)

Socket Socket-R3

Scalability 2S capability

Memory

4xDDR4 channels1333, 1600, 1866 (2 DPC), 2133 (1 DPC)

RDIMM, LRDIMM

QPI2xQPI 1.1 channels6.4, 8.0, 9.6 GT/s

PCIe

PCIe 3.0 (2.5, 5, 8 GT/s)PCIe Extensions: Dual Cast, Atomics

40xPCIe*3.0

Intel® Xeon®

processorE5-2600 v3

Intel® Xeon®

processorE5-2600 v3

QPI2 Channels

DDR4

LANUp to

4x10GbE

PCIe* 3.0, 40 lanes

Intel® C610series

chipset

WBG

DDR4

DDR4

DDR4

DDR4

DDR4

DDR4

DDR4

Page 11: High Performance Hardware for Data Analysis

11

Intel Processor Generations

Product Xeon E5-2600 E5-2600 V2 E5-2600 V3

Microarchitecture SandyBridge IvyBridge Haswell

Cores / Threads 8 / 16 12/24 18/36

Last Level Cache Up to 20MB Up to 30 MB Up to 45 MB

Max Memory Speed

1600 MT/SDDR3

1866 MT/sDDR3

2133 MT/sDDR4

QPI (GT/s) 2 channels6.4, 7.2, 8.0

2 channels

6.4, 7.2, 8.0

2 channels

6.4, 8.0, 9.6

Max DIMMS 12 12 12

Max Clock Speed 3.1GHz / 3.8GHz 3.7 GHz / 3.8GHz 3.7 Ghz / 3.8Ghz

Process Tech 32nm 22nm 22nm

Year 2012 2013 2014

Page 12: High Performance Hardware for Data Analysis

12

• Assume 1-1.5 Hadoop tasks per core– allows headroom for other processes

– Enable Hyperthreading for Hadoop, Spark

– Hyperthreading for others: it depends

• Hadoop: aim for 1 core / disk spindle

• Impala: can handle more spindles and cores easily

• Spark: I/O depends on back end storage

• Faster processor is better– Most Hadoop jobs are I/O bound, not processor bound

– Hadoop compression uses processor cycles

– Less cores with a faster clock is often a good tradeoff

– The Map / Reduce balance depends on actual workload

– It’s hard to optimize more without knowing the actual workload

Selecting Processors

1

2

Page 13: High Performance Hardware for Data Analysis

13

Selecting Memory

• DDR3 versus DDR4, RDIMM versus LRDIMM

– DDR3 is cheaper now, DDR4 is faster (15%)

• DIMM Sizes

• 8GB, 16GB, 32GB, 64GB, 128GB

• Sweet Spot

– Varies, around 32GB right now

• Balance the memory banks

– 4 memory channels per processor

– 4 x 16GB better than 2 x 32GB

• Server Class Memory

– It’s all ECC checked

– Dell Server BIOS options to optimize checking method

Page 14: High Performance Hardware for Data Analysis

14

Selecting Disks

• 3.5” Drives

– 3TB, 4TB, 6TB per drive

– Pricing sweet spot is 3TB

– Use enterprise grade drives, not consumer !!

– SATA or SAS. SAS slightly faster.

– 3.0 GB/sec is fine, 6.0 Gb/sec is a waste with spinning drives

• 2.5” Drives

– 800GB and 1.2 TB

– More expensive than 3.5” drives

– more spindles and performance

• SATA Solid State Drives

– 6.0 Gb/sec

– 2.5” and 1.8” options

– Expensive for now

– Not as deterministic as spindles

Page 15: High Performance Hardware for Data Analysis

15

• Hadoop scales processing and storage together– The cluster grows by adding more data nodes– The ratio of processor to storage is the main adjustment

• Generally, aim for a 1 spindle / 1 core ratio– I/O is large blocks (64Mb to 256Mb)– Primarily sequential read/write, very little random I/O– 8 tasks will be reading or writing 8 individual spindles

• Drive Sizes and Types– NL SAS or Enterprise SATA 6 Gb/sec– Drive size is mainly a price decision

• Depth per node – Up to 48 TB/node is common – 112 Tb / node is possible– Consider how much data is ‘active’– Very deep storage impacts recovery performance

Spindle / Core / Storage Depth Optimization

1

5

Page 16: High Performance Hardware for Data Analysis

16

PowerEdge C8000 Hadoop Scaling - 16 core Xeon

1

6

0

5,000

10,000

15,000

20,000

25,000

30,000

35,0001

15 29

43

57 71

85

99

113

127

141

155

169

183

197

211

22

5

23

9

Tb

Sto

rag

e

(1) 12 spindle 3Tb versus (3) 6 spindle 3Tb

Cores (1)

Storage (1)

IOPS (1)

Storage (3)

IOPS (3)

Page 17: High Performance Hardware for Data Analysis

17

Network Architecture – Layer 2 Switching

Page 18: High Performance Hardware for Data Analysis

18

Network and Switches

• Simple Tree Structure

– Top of Rack (TOR) for each rack / group of nodes

– Racks feed up to a Cluster or Aggregation Switch

– All switching is at Layer 2 (Ethernet)

› No fancy routing or layer 3 (IP) packet inspection

– Most switches are 48 ports in this class

• Switch Characteristics

– Line rate switching at 10Gbps

– Deep buffers to handle bursts

– Virtual Link Trunking (VLT)– two switches act as one, with failover

– Uplinks are 40GbE

• High Availability and Performance

– Use two 10GbE links to alternate switches

– Bond at the Linux level into a single device

Page 19: High Performance Hardware for Data Analysis

19

Model Data Node Configuration Comments RA

R720Xd Dual socket, 12 cores, 24 x 2.5” spindles

Most popular platform for Hadoop

C8000 Dual socket, 16 cores, 16 x 3.5” spindles

Popular for deep/dense Hadoop applications

C6100 / C6105

Dual socket, 8/12 cores, 12 x 3.5” spindles

Two node version. C6100 ishardware EOL

C2100 Dual Socket, 12 cores, 12 x 3.5” spindles

Popular, hardware EOL but often repurposed for Hadoop

R620 Dual Socket, 8 cores, 10 x 2.5” spindles

1U form factor

C6220 Dual-socket, 8 cores,6 x 2.5” spindles

Core/spindle ratio is not ideal for Hadoop.

In the Wild – Dell Customer Hadoop Configurations

1

9

Page 20: High Performance Hardware for Data Analysis

20

• GPU’s– Possible, not seen too often with Hadoop

• Ingest / Streaming– Usually a custom configuration for high speed capture/loading (e.g. Kafka, Storm)

• Dell PowerEdge VRTX– Designed as a ‘mini-blade’ for branch offices

– Could make a killer data science workstation

What I haven’t talked about!

Page 21: High Performance Hardware for Data Analysis

21

Thank you!

Page 22: High Performance Hardware for Data Analysis

22

High Performance Hardware for Data Analysis

• Choosing hardware for big data analysis is difficult because of the many options and variables involved. The problem is more complicated when you need a full cluster for big data analytics.

• This session will cover the basic guidelines and architectural choices involved in choosing analytics hardware for Spark and Hadoop. I will cover processor core and memory ratios, disk subsystems, and network architecture. This is a practical advice oriented session, and will focus on performance and cost tradeoffs for many different options.