Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in...

16
Building Systems for Big Data and Big Compute Steve Scott, Cray CTO Smoky Mountains Conference September 1, 2016

Transcript of Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in...

Page 1: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

Building Systems for Big Data and Big Compute

Steve Scott, Cray CTO

Smoky Mountains ConferenceSeptember 1, 2016

Page 2: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

We’ve Been Doing “Big Data” For a Long Time

Massive Datasets

High Performance Memory, Interconnects, and Storage

Copyright 2016 Cray Inc. 2

Page 3: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Disruptive Memory Technology

Cray Inc.

● Standard DDR memory BW has notkept pace with CPUs

● HBM:● ~10x higher BW, ~10x less energy/bit● Costs ~2x DDR4 per bit

0

200

400

600

800

1000

1200

1400

1600

1800

bandwidth (GB/s) pJ/bit

Today’s DDR4 vs. Future HBM3

4 channels 2.4 GHz DDR4 4 stacks of gen-3 HBM on package

180160140120100

80604020

0

May want more, smaller nodes, with better BW and capacity per op

Page 4: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Most “Big Data” Jobs Aren’t That Big

Copyright 2016 Cray Inc. 4

● Aggregate data becoming very large, but most analytic jobs are modest● Typical data analytics workloads: 10GB mean, 100 GB 95%ile● Prabhat: big HPC analytics jobs ~10x larger than that● Many data analytics jobs run on a handfull of cores

● Meanwhile, the APEX procurement wants multiple PB of memory!

3PB main memory

1 TB “Big Data” job

Page 5: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Most “Big Data” Jobs Aren’t That Big

Copyright 2016 Cray Inc. 5

● Aggregate data becoming very large, but most analytic jobs are modest● Typical data analytics workloads: 10GB mean, 100 GB 95%ile● Prabhat: big HPC analytics jobs ~10x larger than that● Many data analytics jobs run on a handfull of cores

● Meanwhile, the APEX procurement wants multiple PB of memory!

● I’ll interpret “Big Data” as meaning data analytics● Extracting knowledge/insight from data● As opposed to simulation and modeling, which generally producesdata

Page 6: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Convergence of HPC and Big Data

Copyright 2016 Cray Inc. 6

What dowe need tobedoing inHPCthat is differentfrom what we have donein

the past?

Page 7: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

What is an optimal design for HPC?

Copyright 2016 Cray Inc. 7

Node Architecture A

• Dual Haswell nodes @ 2.4 GHz• 128 GB DDR4 @ 2.66 GHz• 12.5 GB/s/node network bandwidth

Node Architecture B

• Dual Haswell nodes @ 2.6 GHz• 256 GB DDR4 @ 2.66 GHz• 25 GB/s/node network bandwidth

Not atall clear.

Whichofthese is better?

Page 8: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

§ Map Reduce§ N-body methods § Graph traversal § Graphical models § Dense and sparse linear algebra § Spectral methods § Structured and unstructured grids § Combinational logic§ Dynamic programming § Backtrack and branch-and-bound § Finite-state machines

§ Basic statistics – simple Map Reduce implementation

§ Generalized n-body problems§ Graph-theoretic computations§ Linear algebraic computations§ Optimizations – e.g., linear programming§ Integration/machine learning§ Alignment problems – e.g., BLAST

Copyright 2016 Cray Inc.

Landscape of Parallel Computing Research (Berkeley – 2006/2008)

State of Big Data: Use Cases and Ogre Patterns (NIST 2014)

Data Analytics can be considered just another set of workloads in a sea of workloads.

8

Page 9: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

GeneralizationsAbout Analytics Workloads

Copyright 2016 Cray Inc. 9

● Data centric workloads⇒ Larger memories and local SSDs are helpful

● Vertical data motion is important● Hadoop and Spark effectively move computation to the data, do initial filtering of data locally⇒ Don’t (usually) need much network bandwidth

● Notable exceptions: Graph analytics and machine learning● Graph analytics

● Can’t partition the data! So really hard to scale! (many get discouraged)● Wants a network that can do fine-grained RDMA well (similar to some HPC)

● Machine Learning● Training problem can be parallelized, can use lots of data, and requires global communication● Wants a very high performance network and memory system

Page 10: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Merging of HPC and Data Analytics

Copyright 2016 Cray Inc. 10

Urika-GDCustom Graph

Analyticsengine Urika-XA

Hadoop, Spark,NoSQL

Urika-GX “Athena”

Cray Graph Engine

HPC + Analytics workflows

Why combine HPC and Analytics solutions in a single box? HPC underneath the covers

Open analytics framework

Aries network

Integrated system: Hadoop/Spark + Graph analytics +

HPC

XC40World’s leading Supercomputer

“Minerva”

Page 11: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Building an Analytics Machine

Copyright 2016 Cray Inc. 11

● Urika-GX Approach:● 48 Haswell nodes per cabinet● Aries network● Up to 512GB DRAM per node● Dual SATA HDDs per node● Up to 4TB/node SSD per node

● XC40 Approach:● 192 Haswell nodes per cabinet● Aries network● Up to 256GB DRAM per node● DataWarp 12TB SSD blades, which can

be dynamically shared across system

But…. need to address Lustre metadata bottleneck for codes that do lots of “local” file IO.

Page 12: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Using Shifter to Accelerate Per-Node I/O

Copyright 2016 Cray Inc. 12

• Demonstrated > 100x speedup vs. straight Lustre on IOPS benchmark at 256 nodes

• Demonstrated Spark scaling to 50,000 cores in CUG 2016 paper

“NAS storage surprisingly close to local SSDs”`

https://cug.org/proceedings/cug2016_proceedings/includes/files/pap125.pdf

Page 13: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Resource and Management and Scheduling

Copyright 2016 Cray Inc. 13

Picture from Malte Schwarzkopf bloghttp://www.firmament.io/blog/scheduler-architectures.html

● Analytics workloads can have very different scheduling needs than HPC workloads● May want very fine-grained scheduling (cores, not nodes)● May have long-running services processing streaming data● May need to dynamically expand/contract● May be tied to real-time events such as experimental control or output processing● May be interactive/bursty (database utilization depends on queries)

Page 14: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Other Analytics Implications (mostly SW)

Copyright 2016 Cray Inc. 14

● Greater diversity of programming languages & environments● Python, R, Julia, Spark, Scala, ML frameworks, etc.● MPI + OpenMP is a foreign concept to the analytics community● Openness and container support are important

● Cloud interoperability● E.g.: source data from cloud ➝ compute/analyze ➝ store data back in cloud

● Data movement between apps● HPC tends to focus on accelerating single applications● Analytics workloads usually involve pipelines● Shared data formats can allow data exchange in memory

● E.g.: Arrow in-memory data structure specification for columnar data

Page 15: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

C O M P U T E | S T O R E | A N A L Y Z E

Take aways

Copyright 2016 Cray Inc. 15

● Strong motivation for HPC + Big Data in a single system● Growing desire for HPC + analytic workflows● More efficient when data can be transferred in memory/SSD● Utilization is better with systems that can be dynamically provisioned

● Big Data is just another set of workloads● Not that different (we already build machines to handle big data)● On average, probably want more memory per node for analytics● Some workloads don’t need much network, but others need a strong network● May argue for heterogeneous systems (already do that for HPC)

● Biggest issue may be resource management/scheduling● A few other software issues, but no show stoppers for converged systems

Page 16: Building Systems for Big Data Big Compute · 2016-09-01 · Strong motivation for HPC + Big Data in a single system Growing desire for HPC + analytic workflows More efficient when

Thank You!Questions?