Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming)...

40
High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow

Transcript of Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming)...

Page 1: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

High-Performance Reconfigurable Computing Group

Department of Electrical and Computer Engineering University of Toronto

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Paul Chow

Page 2: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Who am I?

I am interested in building computing machines/accelerators

Why accelerate?

To go faster, to reduce power

How to accelerate?

We use FPGAs

What are we doing?

Some of our projects

What are the opportunities in Big Data?

July 3, 2014 Tools to Tackle Big Data

2

Page 3: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Moore’s Law

July 3, 2014 Tools to Tackle Big Data

3

Page 4: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Until…

July 3, 2014 Tools to Tackle Big Data

4

I’m melting!!!! - Wizard of Oz 1939

Page 5: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

More cores!!!

July 3, 2014 Tools to Tackle Big Data

5

Programming hard and it’s not always fast enough

Page 6: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Need for accelerators

July 3, 2014 Tools to Tackle Big Data

6

GPUs

FPGAs

Page 7: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

What about these FPGAs?

July 3, 2014 Tools to Tackle Big Data

7 First, a quick introduction….

Page 8: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

FIELD-PROGRAMMABLE GATE ARRAYS

July 3, 2014 Tools to Tackle Big Data

8

Page 9: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

FIELD-PROGRAMMABLE GATE ARRAYS

July 3, 2014 Tools to Tackle Big Data

9

© H.Roesner

© flickr / MadPhysicist

Page 10: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Hardwired FPGA functions

July 3, 2014 Tools to Tackle Big Data

10

SRAM

SRAM

DSP

SerDes

1GEMAC

ARMA9

ARMA9

Page 11: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Computing with FPGAs

•  Fully customized dataflow and buffering

•  Tightly coupled pipelining of computations

•  Very low energy / computation ratio

July 3, 2014 Tools to Tackle Big Data

11

Page 12: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Example: Smith-Waterman DNA sequencing (Dynamic Programming)

49x – 980x speedup (I/O dependent) on Xilinx V4-LX160 FPGA vs.

2.2GHz AMD Opteron

(Storaasli/Cray 2009)

July 3, 2014 Tools to Tackle Big Data

12

Page 13: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

July 3, 2014 Tools to Tackle Big Data

13

Molecular Dynamics

•  Simulate motion of molecules at atomic level

•  Highly compute-intensive

•  Understand protein folding

•  Computer-aided drug design

Page 14: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

FSB

Platform for MD

FSB

NBE NBE

NBE NBE

FSB

NBE NBE

NBE NBE

MEM PME

FSB

NBE NBE

NBE NBE

PME MEM

Socket 0

Socket 2

Socket 1

Socket 3

Short range Nonbonded

Long range Electrostatic

Bonds

Initial Breakdown of CPU Time �  12 short range nonbond FPGAs �  2-3 pipelines/NBE FPGA; Each runs 15-30x CPU �  NBE 360-1080x

�  2 PME FPGAs with fast memory and fibre optic interconnects �  PME 420x

�  Bonds on quad-core Xeon server �  Bonds 1x

Sys Mem

Sys Mem

Quad Xeon

Sys Mem

8.5 GB/s @ 1066 MHz

72.5 GB/s 14

July 3, 2014 Tools to Tackle Big Data

Page 15: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Performance

•  Significant overlap between all force calculations.

•  108.02 ms is equivalent to between 80 and 88 Infiniband-connected cores at U of T’s supercomputer, SciNet.

•  160-176 hyperthreaded cores

•  Can we do better? –  140 with hardware bond

engines – change engine from SW to HW, no architectural change

July 3, 2014

15

Tools to Tackle Big Data

Page 16: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

More Recently

 ISCA 2014 June 16, 2014

A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

•  Demonstrates that FPGAs can work in a data centre

•  Accelerate the Bing ranking engine

•  Double performance for only 10% increase in power

•  To be deployed in Bing in 2015!

July 3, 2014 Tools to Tackle Big Data

16

Page 17: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Traditional Programming of an FPGA

July 3, 2014 Tools to Tackle Big Data

17

Page 18: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Design Flow

July 3, 2014 Tools to Tackle Big Data

18

Figure 12.1. A typical CAD system.

Design conception

VerilogSchematic capture

DESIGN ENTRY

Design correct?

Functional simulation

No

Yes

No

Synthesis

Physical design

Chip configuration

Timing requirements met?

Timing simulation

Brown and Vranesic, Fundamentals of Digital Logic Design with Verilog, 2nd ed.

Page 19: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Placement

July 3, 2014 Tools to Tackle Big Data

19

Figure 12.8. Placement of the circuit in Figure 12.6.

Brown and Vranesic, Fundamentals of Digital Logic Design with Verilog, 2nd ed.

Page 20: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Routing

July 3, 2014 Tools to Tackle Big Data

20

Figure 12.9. Routing for the placement in Figure 12.8.Brown and Vranesic, Fundamentals of Digital Logic Design with Verilog, 2nd ed.

Page 21: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Timing

July 3, 2014 Tools to Tackle Big Data

21

Table 12.1. A summary of static timing analysis results.

Brown and Vranesic, Fundamentals of Digital Logic Design with Verilog, 2nd ed.

Page 22: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

FPGA Programmability Drawbacks

•  Need to understand hardware design

•  Implementation (compile-equivalent) takes hours

•  Established Hardware Description Languages (Verilog HDL, VHDL) are very low-level

•  Downside of design flexibility: No established programming models

July 3, 2014 Tools to Tackle Big Data

22

Page 23: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Programmability is Improving

•  Higher-level HDLs –  SystemC, SystemVerilog, Bluespec

•  High-Level-Synthesis (“C-to-gates”) –  Very active area in research and industry, but: –  So far, only useful if programmer understands hardware –  Target not just an instruction set

•  designing the processor too!

–  Higher level of abstraction allows easier design exploration •  Today, possible to achieve better than hand design in some

cases

July 3, 2014 Tools to Tackle Big Data

23

Page 24: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

So, why are FPGAs still interesting?

•  Even at 10% of a CPU/GPU clock rate •  Very high performance for the right applications

–  Building an application-specific computer –  Custom memory architectures –  Data stream processing especially fits

•  Caches don’t get in the way –  Fine-grain parallelism –  Bit manipulation –  Pattern matching

•  Performance per watt –  25W per chip versus 150W per chip

•  Compute density –  Racks of servers reduced to less than one

July 3, 2014 Tools to Tackle Big Data

24

Page 25: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Often used closer to the data

July 3, 2014 Tools to Tackle Big Data

25

Page 26: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

SOME POSSIBLY RELEVANT PROJECTS

July 3, 2014 Tools to Tackle Big Data

26

Page 27: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

FPGAs as OpenStack Cloud Resources

•  Now we can “boot” a network connected FPGA accelerator on demand, in seconds!

•  Framework for HLS – use HLS to create and then “drop in” accelerators

July 3, 2014 Tools to Tackle Big Data

27

OpenStack Control & Management (C & M)

Server

Hypervisor

Agent

VM VM . . .

FPGA(s)

VFR

VFR

. . .

Server

Agent

Page 28: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Example

•  Dynamically scalable according to demand •  Same OpenStack command to boot/release either resource!

July 3, 2014 Tools to Tackle Big Data

28

VM Resource VFR

Data analysis engine

Web Server

Site Requests

Uploads

Internal Network

x10 000

Data Center Outside World

Page 29: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Accelerators under Hadoop

•  A Hadoop cluster with one x86 as master node and eight ZedBoards as slave nodes –  FPGA: computation –  ARM processor: communication and task tracing

July 3, 2014 Tools to Tackle Big Data

29

Page 30: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

MapReduce Data Flow

July 3, 2014 Tools to Tackle Big Data

Data0 Data1 Data2

Map Map Map

Reduce

Result

HDFS

HDFS

30

Page 31: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

MapReduce Data Flow

July 3, 2014 Tools to Tackle Big Data

Data0 Data1 Data2

Map Map Map

Reduce

Result

HDFS

HDFS

31

Page 32: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

MapReduce Data Flow with FPGA

July 3, 2014 Tools to Tackle Big Data

Data0 Data1 Data2

Map Map Map

Reduce

Result

HDFS

HDFS

FPGA FPGA FPGA

32

Page 33: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

PGAS: Global Shared Memory

–  Easy data transfers between all system memories –  Productive but efficient high-level programming model

July 3, 2014 Tools to Tackle Big Data

FPGA

Host CPU (x86)

Application

NetworkDrivers

Soft API

Embedded CPU

Application

Soft API

Hard API

CustomHardware

Network

Hard API

SRAM SRAMDRAM

DRAM

33

Page 34: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

HPC system: BEE4 + PC hosts

July 3, 2014 Tools to Tackle Big Data

PCIe 2 x8 3.2 GB/s

2x 16GB DDR3-800

1Gb Eth

34

Page 35: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Low power, embedded system: Zynq

July 3, 2014 Tools to Tackle Big Data

On-ChipNetwork

DRAM

1GbitEthernet

FPGA

BlockRAM

CH

Gc

BlockRAM

CH

GcBlockRAM CH

Gc

x8 35

Page 36: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Big Data Memory Systems

To explore the use of FPGAs in the architecture of Big Data memory systems July 3, 2014

36

Tools to Tackle Big Data

Page 37: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Are there better memory architectures?

Replace middleware in-memory system with

application-specific architecture July 3, 2014

37

Tools to Tackle Big Data

Page 38: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

We Need Applications!

•  We are not users

•  We do not understand the applications

•  Do you have a potential application?

•  We could collaborate …

[email protected]

July 3, 2014 Tools to Tackle Big Data

38

Page 39: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Conclusions

•  FPGAs are handicapped by the tools – hard to use –  Good – Abstraction is being raised –  Bad – Iteration time is very long – can be hours

•  FPGAs can provide better performance for many interesting applications

•  FPGAs can provide better performance per watt

•  There should be good opportunities for FPGAs in Big Data – what are they?

July 3, 2014 Tools to Tackle Big Data

39

Page 40: Seeking Opportunities for Hardware Acceleration in Big ... · DNA sequencing (Dynamic Programming) 49x – 980x speedup ... Brown and Vranesic, Fundamentals of Digital Logic Design

Thank you for your attention!

Questions?

July 3, 2014 Tools to Tackle Big Data

40

Paul Chow [email protected]