Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart,...

94
Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis http://research.microsoft.com/en-us/ projects/ziria/

Transcript of Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart,...

Page 1: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

Ziria: Wireless Programming for Hardware Dummies

Božidar Radunovićjoint work with

Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis

http://research.microsoft.com/en-us/projects/ziria/

Page 2: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

2

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

Page 3: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

3

Motivation – why this course?

Lots of innovation in PHY/MAC design Popular experimental platform: GNURadio

Relatively easy to program but slow, no real network deployment

Modern wireless PHYs require high-rate DSP Real-time platforms [SORA, WARP, …]

Achieve protocol processing requirements, difficult to program, no code portability, lots of low-level hand-tuning

Page 4: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

4

Issues for wireless researchers CPU platforms (e.g. SORA)

Manual vectorization, CPU placement Cache / data sizing optimizations

FPGA platforms (e.g. WARP) Latency-sensitive design, difficult for new students/researchers to

break into

Portability/readability Manually highly optimized code is difficult to read and maintain Also: practically impossible to target another platform

Difficulty in writing and reusing code

hampers innovation

Page 5: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

5

Hardware Platforms FPGA: Programmer deals with hardware issues

WARP, Airblue CPUs: SORA bricks [MSR Asia], GNURadio blocks

SORA was a huge breakthrough, design of RX/TX with PCI interface, 16Gbps throughput, ~ μs latency

Very efficient C++ library We build on top of SORA

Many other options now available: E.g. http://myriadrf.org/

Page 6: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

6

What is wrong with current tools?

Page 7: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

7

Current SDR Software Tools Portable (FPGA/CPU), graphical interface:

Simulink, LabView

CPU-based: C/C++/Python GnuRadio, SORA

Control and data separation CodiPhy [U. of Colorado], OpenRadio [Stanford]:

Specialized languages (DSL): Stream processing languages: StreamIt [MIT] DSLs for DSP/arrays, Feldspar [Chalmers]: we put more emphasis on

control Spiral

Page 8: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

8

Issues Programming abstraction is tied to execution model Programmer has to reason about how the program will be

executed/optimized while writing the code

Verbose programming Shared state Low-level optimizationWe next illustrate on Sora code examples(other platforms are have similar problems)

Page 9: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

9

Running example: WiFi receiver

removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

Page 10: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

10

How do we execute this on CPU?

removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

Page 11: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

11

Dataflow streaming abstractions

Events (messages) come in

Events (messages) come out

Why unsatisfactory? It does not expose: (1)When is vertex state (re-)

initialized?(2)Under which external “control”

messages can the vertex change behavior?

(3)How can vertex transmit “control” information to other vertices?

Predominant abstraction today [e.g. SORA, StreamIt, GnuRadio] is that of a “vertex” in a dataflow graph

Reasonable as abstraction of the execution model Unsatisfactory as programming and compilation model

Page 12: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

12

Shared statestatic inlinevoid CreateDemodGraph11a_40M (ISource*& srcAll, ISource*& srcViterbi, ISource*& srcCarrierSense){CREATE_BRICK_SINK (drop, TDropAny, BB11aDemodCtx );CREATE_BRICK_SINK (fsink, TBB11aFrameSink, BB11aDemodCtx );CREATE_BRICK_FILTER (desc, T11aDesc, BB11aDemodCtx, fsink );typedef T11aViterbi <5000*8, 48, 256> T11aViterbiComm;CREATE_BRICK_FILTER (viterbi,T11aViterbiComm::Filter,BB11aDemodCtx, desc );CREATE_BRICK_FILTER (vit0, TThreadSeparator<>::Filter, BB11aDemodCtx, viterbi);// 6MCREATE_BRICK_FILTER (di6, T11aDeinterleaveBPSK, BB11aDemodCtx, vit0 );CREATE_BRICK_FILTER (dm6, T11aDemapBPSK::filter, BB11aDemodCtx, di6 );…

… CREATE_BRICK_SINK (plcp, T11aPLCPParser, BB11aDemodCtx );CREATE_BRICK_FILTER (sviterbik, T11aViterbiSig, BB11aDemodCtx, plcp );CREATE_BRICK_FILTER (dibpsk, T11aDeinterleaveBPSK, BB11aDemodCtx, sviterbik );CREATE_BRICK_FILTER (dmplcp, T11aDemapBPSK::filter, BB11aDemodCtx, dibpsk );CREATE_BRICK_DEMUX5 ( sigsel,TBB11aRxRateSel, BB11aDemodCtx,dmplcp, dm6, dm12, dm24, dm48 );CREATE_BRICK_FILTER (pilot, TPilotTrack, BB11aDemodCtx, sigsel );CREATE_BRICK_FILTER (pcomp, TPhaseCompensate, BB11aDemodCtx, pilot );CREATE_BRICK_FILTER (chequ, TChannelEqualization, BB11aDemodCtx, pcomp );CREATE_BRICK_FILTER (fft, TFFT64, BB11aDemodCtx, chequ );; CREATE_BRICK_FILTER (fcomp, TFreqCompensation, BB11aDemodCtx, fft );CREATE_BRICK_FILTER (dsym, T11aDataSymbol, BB11aDemodCtx, fcomp );CREATE_BRICK_FILTER (dsym0, TNoInline, BB11aDemodCtx, dsym );Shared

state

Page 13: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

13

Separation of control and datavoid Reset() { Next0()->Reset(); // No need to reset all path, just reset the path we used in this frame

switch (data_rate_kbps) {case 6000:case 9000:

Next1()->Reset();break;

case 12000:case 18000:

Next2()->Reset();break;

case 24000:case 36000:

Next3()->Reset();break;

case 48000:case 54000:

Next4()->Reset();break;

} }

Resetting whoever* is downstream*we don’t know who that is when we write this

component

Page 14: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

14

VerbosityDEFINE_LOCAL_CONTEXT(TBB11aRxRateSel, CF_11RxPLCPSwitch, CF_11aRxVector );template<TDEMUX5_ARGS>class TBB11aRxRateSel : public TDemux<TDEMUX5_PARAMS>{ CTX_VAR_RO (CF_11RxPLCPSwitch::PLCPState, plcp_state ); CTX_VAR_RO (ulong, data_rate_kbps ); // data rate in kbps

public: …..public: REFERENCE_LOCAL_CONTEXT(TBB11aRxRateSel); STD_DEMUX5_CONSTRUCTOR(TBB11aRxRateSel) BIND_CONTEXT(CF_11RxPLCPSwitch::plcp_state, plcp_state) BIND_CONTEXT(CF_11aRxVector::data_rate_kbps, data_rate_kbps) {}

- Declarations are written in host language- Language is not specialized, so often verbose

- Hinders fast prototyping

Page 15: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

15

SDR manual optimizations (LUT)struct _init_lut { void operator()(uchar (&lut)[256][128]) { int i,j,k;

uchar x, s, o; for ( i=0; i<256; i++) {

for ( j=0; j<128; j++) { x = (uchar)i; s = (uchar)j; o = 0; for ( k=0; k<8; k++) {

uchar o1 = (x ^ (s) ^ (s >> 3)) & 0x01;

s = (s >> 1) | (o1 << 6);

o = (o >> 1) | (o1 << 7);

x = x >> 1; } lut [i][j] = o; } } } }

Hand-written bit-fiddling code to create lookup

tables for specific computations that must run

very fast

?

Page 16: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

16

Vectorization

removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

- Beneficial to process items in chunks

- But how large can chunks be?

Page 17: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

17

My Own Frustrations Implemented several PHY algorithms in FPGA

Never been able to reuse them: Complexity of interfacing (timing and precision) was higher than

rewriting!

Implemented several PHY algorithms in Sora

Better reuse but still difficult Spent 2h figuring out which internal state variable I haven’t

initialized when borrowed a piece of code from other project.

I want tools to allow me to write reusable codeand incrementally build ever more complex systems!

Page 18: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

18

Improving this situation New wireless programming platform

1. Code written in a high-level language2. Compiler deals with low-level code optimization3. Same code compiles on different platforms (not there just yet!)

Challenges1. Design PL abstractions that are intuitive and expressive2. Design efficient compilation schemes (to multiple platforms)

What is special about wireless1. … that affects abstractions: large degree of separation b/w data

and control2. … that affects compilation: need high-throughput stream

processing

Page 19: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

19

Our Choice: Domain Specific Language What are domain-specific languages? Examples:

Make SQL

Benefits: Language design captures specifics of the task This enables compiler to optimize better

Page 20: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

20

Why is wireless code special? Wireless = lots of signal processing Control vs data flow separation Data processing elements:

FFT/IFFT, Coding/Decoding, Scrambling/Descrambling Predictable execution and performance, independent of data

Control flow elements: Header processing, rate adaptation

Page 21: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

21

Programming model

removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

Page 22: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

22

How do we want code to look like? Example: IEEE 802.11a scrambler: S(x) = x7 + x4 + 1

Ziria:x <- take; do{

tmp := (scrmbl_st[3] ^ scrmbl_st[0]);scrmbl_st[0:5] := scrmbl_st[1:6];scrmbl_st[6] := tmp;y := x ^ tmp

}; emit (y)

Page 23: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

23

What do we not want to optimize? We assume efficient DSP libraries:

FFT Viterbi/Turbo decoding

Same are used in many standards: WiFi, WiMax, LTE

This is readily available: FPGA (Xilinx, Altera) DSP (coprocessors) CPUs (Volk, Sora libraries, Spiral)

Most of PHY design is in connecting these blocks

Page 24: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

24

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

Page 25: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

25

Ziria: A 2-layer design Lower layer

Imperative C-like code for manipulating bits, bytes, arrays, etc. NB: You can plug-in any C function in this layer

Higher layer A monadic language for specifying and staging stream processors Enforces clean separation between control and data flow, clean state

semantics

Runtime implements low-level execution model

Monadic pipeline staging language facilitates aggressive compiler optimizations

Page 26: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

26

A stream transformer t, of type:

ST T a b

Ziria: control-aware stream abstractions

t

inStream (a)

outStream (b)

c

inStream (a)

outStream (b)

outControl (v)

A stream computer c, of type:

ST (C v) a b

Page 27: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

27

Staging a pipeline, in diagrams

c1

t1

t2

t3

C T

repeat { v <- (c1 >>> t1) ; t2 >>> t3 }

“Vertical composition” (along data path -- “arrows”)

“Horizontal composition” (along control path --

“monads”)

Page 28: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

28

Running example:WiFi Scrambler

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

Page 29: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

29

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in <rest of the code>

Start defining computational method

End defining computational method

Page 30: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

30

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

Local variables

Types:- Bit- Array of

bits

Constants

Page 31: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

31

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

Special-purpose computers:

Page 32: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

32

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

Imperative (C/Matlab-like) code:

Page 33: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

33

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

repeat

take doemi

t

yx

Computers and transformers

Page 34: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

34

Whole program

Read >>> do_something >>> write

Reads and writes can come from RF, IP, file, dummy

Page 35: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

35

Computation language primitives Define control flow Two groups:

Transformers Computers

Page 36: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

36

Transformers Map:

let f(x : int) =

var y : int = 42;

y := y + 1;

return (x+y);

in

read >>> map f >>> write

Repeat

let f(x : int) =

x <- take;

if (x > 0) then

emit 1

in

read >>> repeat f >>> write

Page 37: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

37

Computers While:

while (!crc > 0) {

x <- take;

do {crc = search(x);}

}

If-then-else:

if (rate == CR_12) then

emit enc12(x);

else

emit enc23(x);

Also: take, emit, for

Page 38: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

38

Expression language – data processing Mix of C and Matlab Can be directly linked to any C function Subset of data types (mainly fixed point):

<basetype> ::= bit | bool | double | int | int8 | int16 | int32 | complex | complex16 | complex32 | struct TYPENAME | arr <basetype> | arr[INTEGER] <basetype> | arr[length(VARNAME)] <basetype>

Page 39: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

39

Expression language - examplelet build_coeff(pcoeffs:arr[64] complex16, ave:int16, delta:int16) =

var th:int16;

th := ave - delta * 26; for i in [64-26, 26] { pcoeffs[i] := complex16{re=cos_int16(th);im=-sin_int16(th)}; th := th + delta }; th := th + delta; for i in [1,26] { pcoeffs[i] := complex16{re=cos_int16(th);im=-sin_int16(th)}; th := th + delta }in

Array (equivalent to [64-26:64])

Fixed-point complex numbers

External C function

Function

Page 40: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

40

Libraries Ziria header: let external v_sub_complex32(c:arr complex32, a:arr[length(c)] complex32, b:arr[length(c)] complex32 ) : () in

C method:int __ext_v_add_complex32(struct complex32* c, int len, struct complex32* a, int __unused_2, struct complex32* b, int __unused_1)

Libraries (mainly linked to existing Sora libraries): SIMD instructions, FFT and Viterbi, fixed-point trigonometry,

visualisation

Page 41: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

41

Frequently Asked Questions Why defining a new language? Why not use C/Matlab/<your favourite language>?

How do you share state? Why using let x = 20+3*z in instead ofx := 20 + 3*z;?

Why x <- take and not x := take?

Page 42: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

42

Question: How do you implement teleport message?

Decoding

Frequency mixing

Equalizernew_freq

reconfigurationmessage

Page 43: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

43

Answer: Use repeat to reinitialize in the new state

let processor() = var new_freq := X; // initializerepeat { ret <- ( freq_mixing(new_freq)

>>> equalizer >>> decoding

) ; do{ new_freq := ret } }

Freq_mixing

Decoding

repeatEqualize

r

Page 44: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

44

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

Page 45: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

45

How to write a compiler? Haskell + libraries

Parsing, code generation, flexible types, pattern matching

First version in <2 months Easily extendible Moral: compilers can be a useful tool!

Page 46: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

46

Compilation – High-level view Expression language -> C code Computation language -> Execution model Numerous optimizations on the way:

Vectorization Lookup tables Conventional optimizations: Folding, inlining, …

Page 47: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

47

Execution model: How to execute code?

removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

Page 48: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

Runtime

tick()

process(x)

YIELD (data_val)

SKIP

DONE (control_val)

B1

B2process(x)

tick()

Q: Why do we need ticks?

Actions: Return values:

YIELD

DONE

A: Example: emit 1; emit 2; emit 3

Page 49: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

49

Execution model - examplelet comp test1() =

repeat{

(x:int) <- take;

emit x;

}

in

tick()

SKIP

tick()

SKIP

tick()

YIELD(n)

read[int] >>> test1() >>> test1() >>> write[int]

process(n)

YIELD(n)

process(n)

process(n)DONE(n)

process(n)

YIELD(n)

Page 50: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

50

Runtime main loop

L1: t.init() // init top-level componentL2: whatis := t.tick()L3: if (whatis == Yield b) then { put_buf(b); goto L2 } else if (whatis==Skip) then goto L2 else if (whatis==Done) then exit() else if (whatis==NeedInput) then { c = get_buf(); whatis := t.process(x); goto L3; }

In reality:• Very few function calls with a CPS-

based translation: every “process” function knows its continuation

• Optimizations: never tick components with trivial tick(), never generate process() for tick()-only components

• Only indirection is for bind: at different points in times, function pointers point to the correct “process” and “tick”

• Slightly different approach to input/output

Page 51: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

51

How about performance?let comp test1() = repeat{ (x:int) <- take; emit x + 1; }in

read[int] >>> test1() >>> test1() >>> write[int]

(((read >>> let auto_map_6(x: int32) = x + 1 in {map auto_map_6}) >>> let auto_map_7(x: int32) = x + 1 in {map auto_map_7}) >>> write)

buf_getint32(pbuf_ctx, &__yv_tmp_ln10_7_buf);__yv_tmp_ln11_5_buf = auto_map_6_ln2_9(__yv_tmp_ln10_7_buf); __yv_tmp_ln12_3_buf = auto_map_7_ln2_10(__yv_tmp_ln11_5_buf); buf_putint32(pbuf_ctx, __yv_tmp_ln12_3_buf);

Page 52: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

52

Type-preserving transformationslet block_VECTORIZED (u: unit) = var y: int; repeat let vect_up_wrap_46 () = var vect_ya_48: arr[4] int; (vect_xa_47 : arr[4] int) <- take1; __unused_174 <- times 4 (\vect_j_50. (x : int) <- return vect_xa_47[0*4+vect_j_50*1+0]; __unused_1 <- return y := x+1; return vect_ya_48[vect_j_50*1+0] := y); emit vect_ya_48 in vect_up_wrap_46 (tt)

let block_VECTORIZED (u: unit) = var y: int; repeat let vect_up_wrap_46 () = var vect_ya_48: arr[4] int; (vect_xa_47 : arr[4] int) <- take1; emit let __unused_174 = for vect_j_50 in 0, 4 { let x = vect_xa_47[0*4+vect_j_50*1+0] in let __unused_1 = y := x+1 in vect_ya_48[vect_j_50*1+0] := y } in vect_ya_48 in vect_up_wrap_46 (tt)

Dataflow graph iteration converted to tight loop! In this case we got x3

speedup

Page 53: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

53

Transformations on Abstract Syntax Tree One of main benefit of compiler Computation optimizations:

Vectorization, inlining, convert to map, optimize tick/process, …

Expression optimization: Lookup tables, inlining,

calculate constant expressions, unroll loops, …

Tests: Array boundary checks

seq

EArrRead

TArr

x

EVal

VInt

0

EVal

VInt

10

y := x[0,10]

y:=

Page 54: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

54

Flexible compiler design Example: x[0,length(x)] x

subarr_inline_step e | EArrRead evals estart (LILength n) <- unExp e , EVal (VInt 0) <- unExp estart , TArr (Literal m) _ <- info evals , n == m = rewrite evals

Easy to add new transformations

expressionOptimization function

If expression e is of type EArrRead on array evals == m with start estart == 0 and length == (LILength n)

evals => xestart => 0(LILength n) => length(x)

Page 55: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

55

Vectorization Idea: batch processing over multiple data itemsrepeat {(x:int)<-take; emit x} repeat {(x:arr[64] int)<-take; emit x}

Modifications of the execution model: Possible since the execution model is not hardcoded in the code We need to respect the operational semantics

Benefits: LUT: bits -> bytes Lower overhead of the execution model (ticks/processes) Faster memcpy Better cache locality

Page 56: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

Vectorization Challenges

56

ParseHeader

CRC(Len,Rate)

If rate == 6 Mbps

scrambler

½ encoder

interleaver

BPSK

2 bit1 bit

48 bit48 bit

1 bit1 complex

1 bit1 bit

1 bit1 bit

CRC

scrambler

¾ encoder

interleaver

64 QAM

4 bit3 bit

288 bit288 bit

6 bit1 complex

1 bit1 bit

1 bit1 bit

Len

Len

8 bit4 bit

48 bit48 bit

8 bit8 complex

8 bit8 bit

8 bit8 bit

32 bit24 bit

288 bit288 bit

12 bit2 complex

8 bit8 bit

8 bit8 bit

32 bit24 bit

288 bit288 bit

12 bit2 complex

8 bit8 bit

8 bit216 bit

8 bit4 bit

48 bit48 bit

8 bit8 complex

8 bit8 bit

8 bit24 bit

Page 57: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

Look-up Table (LUT) Optimizations Key optimization for Sora TX Identify block of expressions that

transform data has limited input and output size

Replace it with a LUT Similar to FPGA compilation

Especially beneficial for bit operations

Page 58: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

58

LUT Optimizations (by example)let comp scrambler() =  var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1};   var tmp,y: bit;    repeat {      (x:bit) <- take;      do {        tmp := (scrmbl_st[3] ^ scrmbl_st[0]);        scrmbl_st[0:5] := scrmbl_st[1:6];        scrmbl_st[6] := tmp;        y := x ^ tmp      };

      emit (y)  }

let comp v_scrambler () =  var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1};   var tmp,y: bit;

  var vect_ya_26: arr[8] bit;  let auto_map_71(vect_xa_25: arr[8] bit) =    LUT for vect_j_28 in 0, 8 {          vect_ya_26[vect_j_28] := tmp := scrmbl_st[3]^scrmbl_st[0];             scrmbl_st[0:+6] := scrmbl_st[1:+6];             scrmbl_st[6] := tmp;             y := vect_xa_25[0*8+vect_j_28]^tmp;             return y        };        return vect_ya_26  in map auto_map_71

Vectorization

Automatic lookup-table-compilationInput-vars = scrmbl_st, vect_xa_25 = 15 bitsOutput-vars = vect_ya_26, scrmbl_st = 2 bytesIDEA: precompile to LUT of 2^15 * 2 = 64K

Page 59: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

59

Question How to implement bit permutation as LUT?(idea from SORA):

out_arr := perm({0,2,3,1}, in_arr);in_arr = {1,2,3,4} out_arr = {1,3,4,2}

Hint: permutation indices are constants!

Page 60: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

60

Answerlet perm(p : arr int, iarr : arr bit) =

var oarr : arr[length(p)] bit;

var oarrtmp : arr[length(p)] bit;

var iarr1 : arr[8] bit;

unroll for j in [0,length(p)/8] {

let p1 = p[j*8,8];

iarr1 := iarr[j*8,8];

perm8(p1,iarr1,oarrtmp)

oarr := v_or(oarr,oarrtmp);

}

return oarr;

in

let perm8(p : arr[8] int, iarr : arr[8] bit, oarr : arr bit) =

for i in [0,8] {

oarr[p[i]] := iarr[i];

}

in

iarr

oarr

LUT size: 2^8 * sizeof(oarr)

Constants!

out_arr := perm({0,2,3,1}, in_arr);

Page 61: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

61

Supporting different HW architectures Work in progress… SMP vs FPGA vs ASIC Pipeline and data parallelism SIMD, coprocessors (DSP or ASIC)

Page 62: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

62

Pipeline parallelismofdm |>>>| decode >>> packetize

ofdm >>> write(q1) >>> read(q1) >>> decode >>> packetize

ofdm >>> write(q1)

Thread 1, pin to Core 1

read(q1) >>> decode >>> packetize

Thread 2, pin to Core 2

Sync queue

Page 63: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

63

Code Examples

Page 64: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

64

Performance evaluation

Msample*/sec (sample = 32bits)

SORA Ziria Wifi

RX 6Mbps 164 156 40

RX 12Mbps 125 100 40

RX 24Mbps 81 67 40

RX 48Mbps 61 52 40

RX CCA 289 163 40

Mbps/sec SORA Ziria Wifi

TX 6Mbps 54 51 6

TX 12Mbps 98 45 12

TX 24Mbps 145 53 24

TX 48Mbps 231 70 48

- WiFi RX and TX measurements

Page 65: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

65

Real-time LTE-like demo

Page 66: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

66

Status Released to GitHub under Apache 2.0 WiFi implementation included in release Currently supports SORA platform Essential dependency on CPU/SIMD Looking into porting to other CPU-based SDRs

Page 67: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

67

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

Page 68: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

68

Motivation Wireless architecture and design are fragmented:

EE considers PHY and parts of MAC CS considers MAC and above Opportunities for synergies missed: Q: How to change PHY to allow better/simpler network designs?

Page 69: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

69

Examples: HARQ and/or rateless codes: don’t care about rate adaptation, just keep sending

Correlation and detection: detection takes time => huge MAC overheads and terrible efficiency

Channel impulse response and localization: more precise location information

Page 70: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

70

Conventional WiFi design PHY: standardized and cannot be changed

data pipe, correct bits getting in and out

MAC: innovation happen Conventional: CSMA Many alternatives

Page 71: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

71

Conventional cellular design PHY and MAC are standardized

Several modes of operations but none can be changed by applications

IP layer and above: innovations can happen

Page 72: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

72

In reality: We have standard blocks:

Correlators, scramblers, coders/decoders, interleavers, FFT/IFFT

Why not allow building network from these standard blocks? Respect certain ground rules Allow innovations

Page 73: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

73

How to design a wireless transceiver? Challenges: performance, complexity, (cost/patents)

CDMA: Uses pseudo-random code against multi-path Not as complicated to implement as OFDM based systems Difficult to equalise the overall wide spectrum

OFDM: Uses subcarriers to combat multi-path combat multipath with greater robustness and less complexity. OFDMA can achieve higher spectral efficiency with MIMO than CDMA

Rule of a thumb: For data rates >= 10 Mbps use OFDM

Page 74: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

74

OFDM OFDM used in most of the contemporary PHYs: WiFi, LTE, WiMax, 60 GHz, UWB

OFDMA is OFDM variant used in cellular (LTE, WiMax): Multiple users sharing the same OFDM symbol MAC scheduling at sub-symbol level Blurred distinction between MAC and PHY

Page 75: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

75

WiFi TX Overview OFDM transmitter (@56Mbps):emits createSTSinTime(); emits createLTSinTime();crc216(h.len) >>> scrambler() >>> encode34() >>> interleaver_m64qam() >>> modulate_64qam() >>> add_pilots() >>> ifft() Preamble for detection and channel estimation

CRC (with padding) to check for errors Scrambler prevents high peaks Interleaver decouples errors

Page 76: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

76

IFFT example

Page 77: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

77

OFDM Symbol Add pilots, perform IFFT and add cyclic prefix

Page 78: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

78

Channel impulse response

Page 79: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

79

Effect on OFDM symbol in time

Page 80: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

80

Effect on OFDM symbol in frequency

Page 81: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

81

Channel Estimation

Page 82: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

82

Channel inversion/equalization

Page 83: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

83

How to deal with multi-path? Add cyclic prefix

Page 84: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

84

Missing Cyclic Prefix

Page 85: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

85

Scrambler

Avoids large peak-to-avg power ratios

Page 86: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

86

WiFi RX Overviewlet comp receiver() =seq{ (removeDC() >>> t<-detectSTS()) ; params <- ChannelEstimation() ; removeCP() >>> FFT() >>> ChannelEqualization(params) >>> PilotTrack() >>> RemovePilots() >>> receiveBits() }in

Page 87: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

87

Pilots Channel estimation changes across OFDM symbols Channel changes Drift in oscillators between sender and receiver

Pilots are used for channel re-estimation Similar as initial channel estimation Interpolation for data points

Page 88: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

88

Carrier Sensing and Synchronization Find where packet starts Accurate timing needed for the rest of RX Also used to estimate CFO Also used in carrier sensing

Page 89: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

89

Preamble

Page 90: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

90

Detection using correlation Correlate for a known preamble

How do we implement this in CPU/SIMD?

Page 91: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

91

Detection

Page 92: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

92

Performance of Detector

Page 93: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

93

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

Page 94: Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis.

94

Conclusions Wireless innovations will happen at intersections of PHY and MAC levels

We need prototypes and test-beds to evaluate ideas

PHY programming in its infancy Difficult, limited portability and scalability Steep learning curve, difficult to compare and extend previous works

Wireless programming is easy and fun – go for it!http://research.microsoft.com/en-us/projects/

ziria/