The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for...

18
The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilWi J CH K Mi GabrielW eisz, James C. Hoe, Ken Mai Computer Architecture Lab at CMU/ECE/CALCM/Hoe IWLS, June 2011, slide1

Transcript of The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for...

Page 1: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

The CoRAM FPGA Architecturefor Reconfigurable Computing

Eric S. Chung, Weinan Ma, Michael Papamichael

G b i l W i J C H K M iGabriel Weisz, James C. Hoe, Ken Mai

Computer Architecture Lab at

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐1

Page 2: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

What we “know” about the future

100

ITRS road map 2009

What we  know  about the future2009 Intl. Technology Roadmap for Semiconductors

100

m (

log

) Area densitySupply voltageDevice power reduction

10

d t

o 4

0n

m

1

orm

alize

d

Only 4X Lower Power16X Area Density

0

40 36 32 28 25 22 20 18 16 14 13 11

No

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐2

Technology Node (nm)

Must do more Ops/second for less Joules/second

Page 3: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

In‐Core Performance and EnergyIn Core Performance and Energy

DeviceGFLOP/sactual

(GFLOP/s)/mm2

normalized to GFLOP/J

normalized to actual

40nm 40nm

Intel Core i7 (45nm) 96 0.50 1.14

Nvidia GTX285 (55nm) 425 2.40 6.78

MMM Nvidia GTX480 (40nm) 541 1.28 3.52

ATI R5870 (40nm) 1491 5.95 9.87

Xilinx V6‐LX760 (40nm) 204 0 53 3 62

• CPU and GPU benchmarking was compute‐bound; FPGA and 

Xilinx V6 LX760 (40nm) 204 0.53 3.62

Std Cell effectively compute‐bound (no off‐chip I/O)• Power (switching+leakage) measurements isolated the core from the system

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐3

y• For details see [Chung, et al. MICRO 2010]

Page 4: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

More of the SameGFLOP/s (GFLOP/s)/mm2 GFLOP/J

Intel Core i7 (45nm) 67 0.35 0.71

FFT‐21

0

( ) 6 0 35 0

Nvidia GTX285 (55nm) 250 1.41 4.2

Nvidia GTX480 (40nm) 453 1.08 4.3

ATI R5870 (40 )

Mopt/s (Mopts/s)/mm2 Mopts/J

ATI R5870 (40nm) ‐ ‐ ‐

Xilinx V6‐LX760 (40nm) 380 0.99 6.5

Mopt/s (Mopts/s)/mm Mopts/J

holes Intel Core i7 (45nm) 487 2.52 4.88

Nvidia GTX285 (55nm) 10756 60.72 189

Black‐Sc Nvidia GTX480 (40nm) ‐ ‐ ‐

ATI R5870 (40nm) ‐ ‐ ‐

Xilinx V6‐LX760 (40nm) 7800 20.26 138

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐4

( )

For details see [Chung, et al. MICRO 2010]

Page 5: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

Why doesn’t everybody want to compute with FPGAs?

“Traditionally, FPGAs have been the bastardstep‐brother of ASICs…”step brother of ASICs…

Proceedings of ISFPGA 2004

• FPGAs today NOT designed for computing tools and languages difficult to use

users exposed to device and platform details

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐5

no application portability

Page 6: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

Review of FPGA AnatomyReview of FPGA Anatomy Block RAM

4KBBlockRAM

Read DataLocal

Address

Block RAM

Programmable lookup tables (LUT) 

RAMWrite Data

I

g p ( )and flip‐flops (FF)

aka “soft logic” or “fabric”

I/O i terc

onne

ct LUT FF

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐6

I/O pins In

Page 7: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

FPGA “Computers” TodayFPGA  Computers  TodayI/O Pads

User responsible for:

li ti d d t

Memory Controller Memory Controller

FPGA

application and data

platform I/O and 

memory interfacesApplication

M

Control

FPGA memory interfaces

data distribution

l l i

SRAM

To/FromNetwork

Logic

memory control logicNetwork

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐7

I/O PadsMemory Controller Memory Controller

Page 8: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

The CoRAM FPGAThe CoRAM FPGAMemory Interfaces (Global Address Space)Memory Interfaces (Global Address Space)

Hard logic for data 

distribution (NoC)General‐purpose Network‐on‐Chip

M

Application

distribution (NoC)

“Software”‐managed 

memory hierarchy

SRAM memory hierarchy

Simple, portable 

b i f

Controlthreads

abstraction for userGeneral‐purpose Network‐on‐Chip

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐8

Memory Interfaces (Global Address Space)

Page 9: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

Simple ExampleSimple Examplectrlthread() {

cohandle c0 = get_sram(“c0”);

module verilog top(…);

1 c_coram_write(c0,sram_address,global_address,size);

}

g_ p( );coram c0(/*ports*/);cofifo f0 (/*ports*);…

endmodule

size);

2 c_fifo_write(1);

C0

SRAM

WrData

Address

Application

Control Thread

GlobalD

DC0RdData 1

GlobalMemory

DataData

2

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐9

2

Channel FIFO

Page 10: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

Design Case Study: MMMultDesign Case Study: MMMultMMM in hardware Single Compute EngineMMM in hardware N compute engines

(1 per row of matrix ) Compute Engine

Compute Engine

k

OP=writebackOP=load

R

g p g

PacketProcess

custom data networkCompute Engine

Compute Engine

Network R

x +

WenRenPktIn

PktOut

Control

x +

Control

Compute Engine

Compute Engine

C=ABA

B SR

AM R

A/B SRAMsC SRAM

toNext

A/B SRAMsC SRAM

B C DRAM

Packet Generator Address Generator

Packet Format

C SRAMA SRAM

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐10

OP PE_ID DATA

Page 11: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

MMMult with CoRAM

C t l Th d P

MMMult with CoRAMChannel FIFO

PEs ready

void ctrl_thread() {for (j = 0; j < N; j += NB)

Control Thread ProgramStart compute

PEs ready

for (j 0; j < N; j + NB)for (i = 0; i < N; i += NB)for (k = 0; k < N; k += NB) { c_fifo_read(…); for (m = 0; m < NB; m++) {

ll ti it (

x +

Control

c_collective_write(ramsA, m*NB, A + i*N+k + m*N, NB*dsz);

A/B SRAMsC SRAM

…}c_fifo_write(…);

}}

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐11

}}

}

Page 12: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

What we are doing nowWhat we are doing now

• Eric Chungg

architecture and API

microarchitecture and RTL design

• Weinan Ma (advised by Prof. Mai)

circuit‐level design

• Michael Papamichael

on‐chip data network

• Gabe Weisz• Gabe Weisz

high‐level programming 

environment

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐12

environment

Page 13: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

Beneath the AbstractionBeneath the Abstraction

Control thread 

programsSRAM

Application

Architecture

Microarchitectureric

Unit

ric

Unit

Control Unit Control Unit

Fab

Cluster 

Fab

Cluster 

CoRAMFPGA

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐13

Memory Interfaces (DRAM or Cache)

Bulk Data Distribution (Network‐on‐Chip)

Page 14: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

Control Threads: Soft or Hard?Control Threads: Soft or Hard?

Soft methods

Option 1:Synthesis to Fabric

Soft methods only rely on RL fabric

must share logic with app

efficiency/perf depends on quality of C‐based synthesis and compilation

Control Thread

Option 2:Compile to soft cores

pThread Programs

Hard method

Option 3:

dedicated silicon for cores

high perf/efficiency

h th

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐14

pHard micro‐controllers consumes area whether 

used or not

Page 15: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

RTL P t t i ( FPGA)Mi bl

RTL Prototyping (on FPGA)

100MHz

AXI Memory Bus

Microblaze(100MHz)

I/O DevicesI/O Subsystem 

and Drivers

RouterCacheBank

128b

100MHz100MHz200MHz

Router

MMMultiply Compute Elements

RouterCacheBank

MemoryController(Xilinx 

128bRouter

Arbite

r

tform Adapter

512b

x+

Control Unit

x+

Control Unit

x+

Control UnitControl

CoRAMCluster

MIG) 128b

A

RouterRouter

RouterCacheBank

128bCacheBank

Xilinx Plat

x+

Control Unit

x+

Control Unit

x+

Control UnitControl

CoRAMCluster

CoRAMCluster

Router

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐15

Bank Cluster

NoC and ClustersDRAM Interfaces DRAM Caches

Page 16: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

Is CoRAM A Good Idea?Is CoRAM A Good Idea?

What is the impact on programmability and portability?

l f application performance?

area and power?

Some important questionsshould CoRAM support be hard soft or hybrid? should CoRAM support be hard, soft, or hybrid?

how to incorporate I/O devices, multi‐FPGA?

what applications can CoRAM support?

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐16

what applications can CoRAM support?

Page 17: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

Is CoRAM A Good Idea?Is CoRAM A Good Idea?

Y b th j d ! You can be the judge!

Tool release planned f lfor late 2011

Visit us at:/www.ece.cmu.edu/~coram

Relevant papers:E i S Ch J C H d K M i C RAM A I F b i M• Eric S. Chung, James C. Hoe, and Ken Mai. CoRAM: An In‐Fabric Memory Architecture for FPGA‐Based Computing, In Proceedings of FPGA‐19, 2011.

• Eric S. Chung, Peter Milder, James C. Hoe, and Ken Mai. Single‐chip Heterogeneous Computing: Does the Future Include Custom Logic, GPGPUs,

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐17

Heterogeneous Computing: Does the Future Include Custom Logic, GPGPUs, and ASICs? In Proceedings of MICRO‐43, 2010.

Page 18: The CoRAM FPGA Architecture for Reconfigurable Computing · The CoRAM FPGA Architecture for Reconfigurable Computing Eric S. Chung, Weinan Ma, Michael Papamichael GbilGabriel WiWeisz,

Computer Architecture Lab (CALCM)

Carnegie Mellon University

h // d /

CMU/ECE/CALCM/Hoe IWLS, June 2011, slide‐18

http://www.ece.cmu.edu/CALCM