Download - Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Transcript
Page 1: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Functioning Hardwarefrom

Functional Specifications

Stephen A. EdwardsMartha A. Kim

Richard TownsendKuangya Zhai

Lianne Lairmore

Columbia University

IBM PL Day, November 18, 2014

Page 2: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Where’s my 10 GHz processor?

Page 3: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Intel CPU Trends

Sutter, The Free Lunch is Over, DDJ 2005. Data: Intel, Wikipedia, K. Olukotun

Page 4: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

The Cray-2: Immersed in FluorinertThe Cray-2: Immersed in Fluorinert

1985 ECL 150 kW1985 ECL 150 kW

Page 5: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Where’s all that power going?What can we do about it?

Page 6: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Dally: Calculation Cheap; Communication Costly

64b FPU

0.1mm2

50pJ /op1.5GHz

64b 1mm

Channel25pJ /word

64b Off-Chip

Channel1nJ /word

64bFloatingPoint

20mm

10m

m 2

50pJ

, 4 c

ycle

s

“Chips are powerlimited and mostpower is spentmoving data

Performance =Parallelism

Efficiency =Locality

Bill Dally’s 2009 DAC Keynote, The End of Denial Architecture

Page 7: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Parallelism for Performance; Locality for Efficiency

Dally: “Single-thread processors arein denial about these two facts”

We needdifferent programming paradigmsanddifferent architectureson which to run them.

Page 8: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

The Future is Wires and Memory

Page 9: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

How best to use all thosetransistors?

Page 10: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Dark SiliconDark Silicon

Page 11: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Taylor and Swanson’s Conservation Cores

BB1

BB0

BB2

CFG +

+

*

LD

+ LD LD

+1 <N?

+ + +

ST +

Datapath

Inter-BB State Machine

0.01 mm2 in 45 nm TSMC runs at 1.4 GHz

.V

Synopsys IC Compiler, P&R, CTS

C-core Generation

.V

Code to Stylized Verilog and through a CAD flow.

Custom datapaths, controllers for loop kernels; uses existingmemory hierarchySwanson, Taylor, et al. Conservation Cores. ASPLOS 2010.

Page 12: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Bacon et al.’s Liquid Metal

Fig. 2. Block level diagram of DES and Lime code snippet

JITting Lime (Java-like, side-effect-free, streaming) to FPGAsHuang, Hormati, Bacon, and Rabbah, Liquid Metal, ECOOP 2008.

Page 13: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

What are we doing about it?

Page 14: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Functional Programs to FPGAs

Page 15: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Functional Programs to FPGAs

Page 16: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Functional Programs to FPGAs

Page 17: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Functional Programs to FPGAs

Page 18: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Functional Programs to FPGAs

Page 19: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Functional Programs to FPGAs

Page 20: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Functional Programs to FPGAs

Page 21: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Why Functional Specifications?

I Referentialtransparency/side-effect freedommake formal reasoning aboutprograms vastly easier

I Inherently concurrent andrace-free (Thank Church andRosser). If you want races anddeadlocks, you need to addconstructs.

I Immutable data structures makesit vastly easier to reason aboutmemory in the presence ofconcurrency

Page 22: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Why FPGAs?

I We do not know the structure offuture memory systemsHomogeneous/Heterogeneous?Levels of Hierarchy?Communication Mechanisms?

I We do not know the architectureof future multi-coresProgrammable in Assembly/C?Single- or multi-threaded?

Use FPGAs as a surrogate. Ultimately too flexible, butrepresentative of the long-term solution.

Page 23: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

The Practical Question

How do we synthesize hardwarefrom pure functional languages

for FPGAs?

Control and datapath are easy; the memory system isinteresting.

Page 24: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

The Type System: Algebraic Data Types

Types are primitive (Boolean, Integer, etc.) or other ADTs:

type ::= Type Primitive type| Constr Type∗ | · · · | Constr Type∗ Tagged union

Examples:

data Intlist = Nil −− Linked list of integers| Cons Int Intlist

data Bintree = Leaf Int −− Binary tree of integers| Branch Bintree Bintree

data Expr = Literal Int −− Arithmetic expression| Var String| Binop Expr Op Expr

data Op = Add | Sub | Mult | Div

Page 25: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Example: Huffman Decoder in Haskell

data HTree = Branch HTree HTree | Leaf Chardecode :: HTree→ [Bool]→ [Char]decode table str = bit str tablewhere

bit (False:xs) (Branch l _) = bit xs l −− 0: leftbit (True:xs) (Branch _ r) = bit xs r −− 1: rightbit x (Leaf c) = c : bit x table −− leafbit [] _ = [] −− done

1 Leaf8-bit char

9-bit pointer9-bit pointer 0 Branch

1 ConsB12-bit pointer

0 Nil

1 Cons8-bit char10-bit pointer

0 Nil

Page 26: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Optimizations

Memory

Page 27: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Optimizations

Memory

Input Output

HTree

SplitMemories

Page 28: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Optimizations

Memory

Input Output

HTree

In FIFO

Mem

Out FIFO

MemHTree

SplitMemories

Use Streams

Page 29: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Optimizations

Memory

Input Output

HTree

In FIFO

Mem

Out FIFO

MemHTree

In FIFO

Mem

Out FIFO

MemHTree

SplitMemories

Use Streams

Unroll for locality

Page 30: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Optimizations

Memory

Input Output

HTree

In FIFO

Mem

Out FIFO

MemHTree

In FIFO

Mem

Out FIFO

MemHTree

In FIFO

Mem

Out FIFO

Mem

HTree Mem

SplitMemories

Use Streams

Unroll for locality

Speculate

Page 31: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Hardware Synthesis:Semantics-preserving steps to

a low-level dialect

Page 32: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Removing Recursion: The Fib Example

fib n = case n of1 → 12 → 1n → fib (n−1) + fib (n−2)

Page 33: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Transform to Continuation-Passing Style

fibk n k = case n of1 → k 12 → k 1n → fibk (n−1) (λn1→

fibk (n−2) (λn2→k (n1 + n2)))

fib n = fibk n (λx→ x)

Page 34: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Name Lambda Expressions (Lambda Lifting)

fibk n k = case n of1 → k 12 → k 1n → fibk (n−1) (k1 n k)

k1 n k n1 = fibk (n−2) (k2 n1 k)k2 n1 k n2 = k (n1 + n2)k0 x = xfib n = fibk n k0

Page 35: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Represent Continuations with a Type

data Cont = K0 | K1 Int Cont | K2 Int Cont

fibk n k = case (n,k) of(1, k) → kk k 1(2, k) → kk k 1(n, k) → fibk (n−1) (K1 n k)

kk k a = case (k, a) of((K1 n k), n1)→ fibk (n−2) (K2 n1 k)((K2 n1 k), n2)→ kk k (n1 + n2)(K0, x ) → x

fib n = fibk n K0

Page 36: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Merge Functions

data Cont = K0 | K1 Int Cont | K2 Int Contdata Call = Fibk Int Cont | KK Cont Int

fibk z = case z of(Fibk 1 k) → fibk (KK k 1)(Fibk 2 k) → fibk (KK k 1)(Fibk n k) → fibk (Fibk (n−1) (K1 n k))

(KK (K1 n k) n1) → fibk (Fibk (n−2) (K2 n1 k))(KK (K2 n1 k) n2)→fibk (KK k (n1 + n2))(KK K0 x ) → x

fib n = fibk (Fibk n K0)

Page 37: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Add Explicit Memory Operations

load :: CRef→Contstore :: Cont→CRefdata Cont = K0 | K1 Int CRef | K2 Int CRefdata Call = Fibk Int CRef | KK Cont Int

fibk z = case z of(Fibk 1 k) → fibk (KK (load k) 1)(Fibk 2 k) → fibk (KK (load k) 1)(Fibk n k) → fibk (Fibk (n−1) (store (K1 n k)))

(KK (K1 n k) n1) → fibk (Fibk (n−2) (store (K2 n1 k)))(KK (K2 n1 k) n2)→fibk (KK (load k) (n1 + n2))(KK K0 x ) → x

fib n = fibk (Fibk n (store K0))

Page 38: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Syntax-Directed Translation to Hardware

fib

fibk

load

store

mem

load′

Call

do

wedia

CRef

CRef

Cont

result

n

Cont

Page 39: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Duplication Can Increase Parallelism

fib 0 = 0fib 1 = 1fib n = fib (n−1) + fib (n−2)

After duplicating functions:

fib 0 = 0fib 1 = 1fib n = fib ′ (n−1) + fib′ ′ (n−2)

fib ′ 0 = 0fib ′ 1 = 1fib ′ n = fib ′ (n−1) + fib′ (n−2)

fib ′ ′ 0 = 0fib ′ ′ 1 = 1fib ′ ′ n = fib ′ ′ (n−1) + fib′ ′ (n−2)

Here, fib’ and fib” may run inparallel.

Page 40: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Duplication Can Increase Parallelism

fib 0 = 0fib 1 = 1fib n = fib (n−1) + fib (n−2)

After duplicating functions:

fib 0 = 0fib 1 = 1fib n = fib ′ (n−1) + fib′ ′ (n−2)

fib ′ 0 = 0fib ′ 1 = 1fib ′ n = fib ′ (n−1) + fib′ (n−2)

fib ′ ′ 0 = 0fib ′ ′ 1 = 1fib ′ ′ n = fib ′ ′ (n−1) + fib′ ′ (n−2)

Here, fib’ and fib” may run inparallel.

Page 41: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

Unrolling Recursive Data Structures

Original Huffman tree type:

data Htree = Branch Htree HTree | Leaf Char

Unrolled Huffman tree type:

data Htree = Branch Htree′ HTree′ | Leaf Chardata Htree′ = Branch′ Htree′′ HTree′′ | Leaf′ Chardata Htree′′ = Branch′′ Htree HTree | Leaf′ ′ Char

Increases locality: larger data blocks.

A type-aware cache line

Page 42: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

I Dark Silicon is the future: fastertransistors; most must remain off

I Custom accelerators are thefuture; many approaches

I Our project: A Pure FunctionalLanguage to FPGAs

BB1

BB0

BB2

CFG

+

+

*

LD

+ LD LD

+1 <N?

+ + +

ST +

Datapath

Inter-BB State Machine

0.01 mm2 in 45 nm TSMC runs at 1.4 GHz

.V

Synopsys IC Compiler, P&R, CTS

C-core Generation

.V

Code to Stylized Verilog and through a CAD flow.

C et al.

x86 et al.

gcc et al.

abstraction

timetoday

FutureLangu

ages Higher-levellanguages

Future ISAsMorehardware reality

A Functional IR

FPGAs

Page 43: Functioning Hardware from Functional Specifications€¦ · IBM PL Day, November 18, 2014. Where’s my 10 GHz processor? Intel CPU Trends ... Bill Dally’s 2009 DAC Keynote, The

I Algebraic Data Types in Hardware

I Optimizations

I Removing recursion

Encoding the Types

Huffman tree nodes: (19 bits)

1 Leaf8-bit char

9-bit pointer9-bit pointer 0 Branch

Boolean input stream: (14 bits)

1 ConsB12-bit pointer

0 Nil

Character output stream: (19 bits)

1 Cons8-bit char10-bit pointer

0 Nil

Optimizations

Memory

Input Output

HTree

In FIFO

Mem

Out FIFO

MemHTree

In FIFO

Mem

Out FIFO

MemHTree

In FIFO

Mem

Out FIFO

Mem

HTree Mem

SplitMemories

Use Streams

Unroll for locality

Speculate

Syntax-Directed Translation to Hardware

fib

fibk

load

store

mem

load′

Call

do

wedia

CRef

CRef

Cont

result

n

Cont