June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike...

24
June 5, 2006 1 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology MIT-Nokia Architecture Group

Transcript of June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike...

Page 1: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

June 5, 2006 1

Architectural Exploration: 802.11a Transmitter

Arvind, Nirav Dave, Steve Gerding, Mike PellauerComputer Science & Artificial Intelligence LaboratoryMassachusetts Institute of Technology

MIT-Nokia Architecture Group Helsinki, June 5, 2006

Page 2: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

2

Why architectural exploration

Architects are clever people and can think of a variety of designsBut often cannot determine which design is best for a given metric (e.g., power) Too short of time and manpower to go far

enough with several designs for proper evaluation

Guess work instead of architectural exploration

New design tools can change all that

Page 3: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

3

This talk

Architectural exploration of 802.11a transmitter The goal is to show that it is easy and

economical to do so in Bluespec You don’t have to know 802.11a or Bluespec

to understand the talk

Page 4: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

4

802.11a Transmitter Overview

Controller Scrambler Encoder

Interleaver Mapper

IFFTCyclicExtend

headers

data

IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain)

complex numbers accounts for > 95% area

24 Uncoded

bits

One OFDM symbol (64 Complex Numbers)

Must produce one OFDM symbol every 4 sec

Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol

Page 5: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

5

Combinational IFFT

in0

in1

in2

in63

in3

in4

Radix 4

Radix 4

Radix 4

x16

Radix 4

Radix 4

Radix 4

Radix 4

Radix 4

Radix 4

out0

out1

out2

out63

out3

out4

Perm

ute

_1

Perm

ute

_2

Perm

ute

_3

All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power, ...

*

*

*

*

+

-

-

+

+

-

-

+

*jt2

t0

t3

t1

Page 6: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

6

Design Tradeoffs

1. We can decrease the area by multiplexing some circuits

It may be a win if the throughput requirements can be met without increasing the frequency

2. Power can be lowered by lowering the frequency, which can be adjusted by changing the voltage

power (voltage)2

Page 7: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

7

Combinational IFFTOpportunity for reuse

in0

in1

in2

in63

in3

in4

Radix 4

Radix 4

Radix 4

x16

Radix 4

Radix 4

Radix 4

Radix 4

Radix 4

Radix 4

out0

out1

out2

out63

out3

out4

Perm

ute

_1

Perm

ute

_2

Perm

ute

_3

Reuse the same circuit three times

Page 8: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

8

Circular pipeline: Reusing the Pipeline Stagein0

in1

in2

in63

in3

in4

out0

out1

out2

out63

out3

out4

Radix 4

Radix 4

Perm

ute

_1Perm

ute

_2Perm

ute

_3

Stage Counter16 Radix 4s can be

shared but not the three permutations. Hence the need for muxes

64

, 4-w

ay

Muxes

Page 9: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

9

Superfolded circular pipeline: Just one Radix-4 node!in0

in1

in2

in63

in3

in4

out0

out1

out2

out63

out3

out4

Radix 4

Perm

ute

_1Perm

ute

_2Perm

ute

_3

Stage Counter 0 to 2

Index Counter 0 to 15

64

, 4-w

ay

Muxes

4, 1

6-w

ay

Muxes

4, 1

6-w

ay

DeM

uxes

Designs with 2, 4, and 8 Radix-4 modules make sense too!

Page 10: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

10

Which design consumes the least energy to transmit a symbol?

Can we quickly code up all the alternatives? single source with parameters?

Not practical in traditional hardware description languages like Verilog/VHDL

Page 11: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

June 5, 2006 11

Expressing the designs in Bluespec

Page 12: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

12

Bluespec code: Radix-4 Nodefunction Vector#(4,Complex) radix4(Vector#(4,Complex) t, Vector#(4,Complex) k);

Vector#(4,Complex) m = newVector(), y = newVector(), z = newVector();

m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3];

y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]);

z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3];

return(z);endfunction

Polymorphic code: works on any type of numbers for which *, + and - have been defined

*

*

*

*

+

-

-

+

+

-

-

+

*j

Page 13: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

13

Combinational IFFTCan be used as a reference

in0

in1

in2

in63

in3

in4

Radix 4

Radix 4

Radix 4

x16

Radix 4

Radix 4

Radix 4

Radix 4

Radix 4

Radix 4

out0

out1

out2

out63

out3

out4

Perm

ute

_1

Perm

ute

_2

Perm

ute

_3

stage_f function

repeat it three times

Page 14: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

14

Bluespec Code for Combinational IFFT

function SVector#(64, Complex) stage_f(Bit#(2) stage, SVector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = radix4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx + 1] = y[1]; stage_temp[idx + 2] = y[2]; stage_temp[idx + 3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; endreturn(stage_out);

function SVector#(64, Complex) ifft (SVector#(64, Complex) in_data);//Declare vectors SVector#(4,SVector#(64, Complex)) stage_data = replicate(newSVector); stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[i+1] = stage_f(stage, stage_data[i]);return(stage_data[3]);

Stage function

The code is unfolded to generate a combinational circuit

Page 15: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

15

Synchronous pipeline

rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2(sReg1); outQ.enq(f3(sReg2));endrule

xsReg1inQ

f1 f2 f3

sReg2 outQ

This is real IFFT code; just replace f1, f2 and f3 with stage_f code

Page 16: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

16

Folded pipeline

x

sReginQ

rule folded-pipeline (True); if (stage==1) begin inQ.deq(); sxIn= inQ.first(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==3) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==3)? 1 : stage+1;endrule

f

outQstage

f1

f2

f3

function f (stage,sx); case (stage) 1: return f1(sx); 2: return f2(sx); 3: return f3(sx); endcaseendfunction

This is real IFFT code too ...

Page 17: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

17

Expressing these designs in Bluespec is easy

All these designs were done in less than one day!Area and power estimates?

Combinational

Pipelined

Folded (16 Radices)

Super-Folded (8 Radices)

Super-Folded (4 Radices)

Super-Folded (2 Radices)

Super-Folded (1 Radix)How long will it take to write these designs in Verilog? VHDL? SystemC?

Page 18: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

18

Bluespec Tool flowBluespec SystemVerilog source

Verilog 95 RTL

Verilog sim

VCD output

DebussyVisualization

Bluespec Compiler

RTL synthesis

gates

C

Bluespec C sim CycleAccurate

FPGAPower estimatio

n tool

Power estimatio

n tool

Sequence Design PowerTheater

Page 19: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

19

802.11a Transmitter Synthesis results for various IFFT designs

IFFT Design Area (mm2)

Min. CLK Period(ns)

Latency(clks/Sym)

ns/output(req 4000)

Combinational 15.15 33.0 10 132

Pipelined 15.50 12.2 12 49

Folded(16 Radices)

6.26 13.0 12 52

Super-Folded(8 Radices)

4.02 13.1 15 79

SF (4 Radices) 2.86 13.1 21 157

SF (2 Radices) 2.33 13.2 33 317

SF (1 Radix) 2.00 13.2 48 634

TSMC .18 micron; numbers reported are before place and route.Some areas will be larger after layout.

Page 20: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

20

Algorithmic Improvements

in0

in1

in2

in63

in3

in4

Radix 4

Radix 4

Radix 4

x16

Radix 4

Radix 4

Radix 4

Radix 4

Radix 4

Radix 4

out0

out1

out2

out63

out3

out4

Perm

ute

_1

Perm

ute

_2

Perm

ute

_3

1. All the three permutations can be made identical more saving in area

2. One multiplication can be removed from Radix-4

Page 21: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

21

802.11a Transmitter Synthesis results: old vs. new IFFT designs

IFFT Design Old Area (mm2)

New Area (mm2)

Combinational 15.15 5.91

Pipelined 15.50 6.26

Folded(16 Radices)

6.26 4.61

Super-Folded(8 Radices)

4.02 3.57

SF(4 Radices) 2.86 2.75

SF(2 Radices) 2.33 2.21

SF (1 Radix) 2.00 1.67

TSMC .18 micron; numbers reported are before place and route.

???

exp

ecte

d

Page 22: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

22

802.11a Transmitter Synthesis results with new IFFT designs

IFFT Design Area (mm2)

Min. CLK Period(ns)

Latency(clks/

Symbol)

Min. ns/output

PermittedClock scaling

Combinational 5.91 33.0 10 132 30

Pipelined 6.26 12.0 12 49 83

Folded(16 Radices)

4.61 13.0 12 52 77

Super-Folded(8 Radices)

3.57 13.1 15 79 51

SF(4 Radices) 2.75 13.1 21 157 25

SF(2 Radices) 2.21 13.1 33 314 13

SF (1 Radix) 1.67 13.1 57 629 6

TSMC .18 micron; numbers reported are before place and route.

Page 23: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

23

802.11a Transmitter with new IFFT designs: Power EstimatesIFFT Design

c1Area

(mm2) c2

Min Freq.

c3

Power(mW) @ 100MHz

c4

Power(mW) @ Min Freq.

c5

Energy/Symb(nJ)

c6

Combinational 5.91 1 MHz 398.6 0.399 1.594

Pipeline (48 R-4) 6.26 1 MHz 438.6 0.439 1.754

Folded (16 R-4) 4.61 1 MHz 475.6 0.476 1.902

SF (8 R-4) 3.57 1.5MHz 299.7 0.446 1.798

SF (4 R-4) 2.75 3MHz 166.2 0.499 1.994

SF (2 R-4) 2.21 6MHz 98.7 0.592 2.369

SF (1 R-4) 1.67 12MHz 66.2 0.794 3.178

c3 = min clock x scaling factor; c4 is raw data collected by the Sequence Design PowerTheater c5 = c4xc3/100MHz/voltage scaling(=10); c6 = c5x4 sec

Work in progress

Page 24: June 5, 20061 Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.

24

SummaryIt is essential to do architectural exploration for better (area, power, performance, ...) designs.It is possible to do so with new design tools and methodologies.Better and faster tools for estimating area, timing and power would dramatically increase our capability to do architectural exploration.

Thanks