Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart,...

Post on 15-Jan-2016

217 views 0 download

Tags:

Transcript of Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart,...

Ziria: Wireless Programming for Hardware Dummies

Božidar Radunovićjoint work with

Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis

http://research.microsoft.com/en-us/projects/ziria/

2

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

3

Motivation – why this course?

Lots of innovation in PHY/MAC design Popular experimental platform: GNURadio

Relatively easy to program but slow, no real network deployment

Modern wireless PHYs require high-rate DSP Real-time platforms [SORA, WARP, …]

Achieve protocol processing requirements, difficult to program, no code portability, lots of low-level hand-tuning

4

Issues for wireless researchers CPU platforms (e.g. SORA)

Manual vectorization, CPU placement Cache / data sizing optimizations

FPGA platforms (e.g. WARP) Latency-sensitive design, difficult for new students/researchers to

break into

Portability/readability Manually highly optimized code is difficult to read and maintain Also: practically impossible to target another platform

Difficulty in writing and reusing code

hampers innovation

5

Hardware Platforms FPGA: Programmer deals with hardware issues

WARP, Airblue CPUs: SORA bricks [MSR Asia], GNURadio blocks

SORA was a huge breakthrough, design of RX/TX with PCI interface, 16Gbps throughput, ~ μs latency

Very efficient C++ library We build on top of SORA

Many other options now available: E.g. http://myriadrf.org/

6

What is wrong with current tools?

7

Current SDR Software Tools Portable (FPGA/CPU), graphical interface:

Simulink, LabView

CPU-based: C/C++/Python GnuRadio, SORA

Control and data separation CodiPhy [U. of Colorado], OpenRadio [Stanford]:

Specialized languages (DSL): Stream processing languages: StreamIt [MIT] DSLs for DSP/arrays, Feldspar [Chalmers]: we put more emphasis on

control Spiral

8

Issues Programming abstraction is tied to execution model Programmer has to reason about how the program will be

executed/optimized while writing the code

Verbose programming Shared state Low-level optimizationWe next illustrate on Sora code examples(other platforms are have similar problems)

9

Running example: WiFi receiver

removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

10

How do we execute this on CPU?

removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

11

Dataflow streaming abstractions

Events (messages) come in

Events (messages) come out

Why unsatisfactory? It does not expose: (1)When is vertex state (re-)

initialized?(2)Under which external “control”

messages can the vertex change behavior?

(3)How can vertex transmit “control” information to other vertices?

Predominant abstraction today [e.g. SORA, StreamIt, GnuRadio] is that of a “vertex” in a dataflow graph

Reasonable as abstraction of the execution model Unsatisfactory as programming and compilation model

12

Shared statestatic inlinevoid CreateDemodGraph11a_40M (ISource*& srcAll, ISource*& srcViterbi, ISource*& srcCarrierSense){CREATE_BRICK_SINK (drop, TDropAny, BB11aDemodCtx );CREATE_BRICK_SINK (fsink, TBB11aFrameSink, BB11aDemodCtx );CREATE_BRICK_FILTER (desc, T11aDesc, BB11aDemodCtx, fsink );typedef T11aViterbi <5000*8, 48, 256> T11aViterbiComm;CREATE_BRICK_FILTER (viterbi,T11aViterbiComm::Filter,BB11aDemodCtx, desc );CREATE_BRICK_FILTER (vit0, TThreadSeparator<>::Filter, BB11aDemodCtx, viterbi);// 6MCREATE_BRICK_FILTER (di6, T11aDeinterleaveBPSK, BB11aDemodCtx, vit0 );CREATE_BRICK_FILTER (dm6, T11aDemapBPSK::filter, BB11aDemodCtx, di6 );…

… CREATE_BRICK_SINK (plcp, T11aPLCPParser, BB11aDemodCtx );CREATE_BRICK_FILTER (sviterbik, T11aViterbiSig, BB11aDemodCtx, plcp );CREATE_BRICK_FILTER (dibpsk, T11aDeinterleaveBPSK, BB11aDemodCtx, sviterbik );CREATE_BRICK_FILTER (dmplcp, T11aDemapBPSK::filter, BB11aDemodCtx, dibpsk );CREATE_BRICK_DEMUX5 ( sigsel,TBB11aRxRateSel, BB11aDemodCtx,dmplcp, dm6, dm12, dm24, dm48 );CREATE_BRICK_FILTER (pilot, TPilotTrack, BB11aDemodCtx, sigsel );CREATE_BRICK_FILTER (pcomp, TPhaseCompensate, BB11aDemodCtx, pilot );CREATE_BRICK_FILTER (chequ, TChannelEqualization, BB11aDemodCtx, pcomp );CREATE_BRICK_FILTER (fft, TFFT64, BB11aDemodCtx, chequ );; CREATE_BRICK_FILTER (fcomp, TFreqCompensation, BB11aDemodCtx, fft );CREATE_BRICK_FILTER (dsym, T11aDataSymbol, BB11aDemodCtx, fcomp );CREATE_BRICK_FILTER (dsym0, TNoInline, BB11aDemodCtx, dsym );Shared

state

13

Separation of control and datavoid Reset() { Next0()->Reset(); // No need to reset all path, just reset the path we used in this frame

switch (data_rate_kbps) {case 6000:case 9000:

Next1()->Reset();break;

case 12000:case 18000:

Next2()->Reset();break;

case 24000:case 36000:

Next3()->Reset();break;

case 48000:case 54000:

Next4()->Reset();break;

} }

Resetting whoever* is downstream*we don’t know who that is when we write this

component

14

VerbosityDEFINE_LOCAL_CONTEXT(TBB11aRxRateSel, CF_11RxPLCPSwitch, CF_11aRxVector );template<TDEMUX5_ARGS>class TBB11aRxRateSel : public TDemux<TDEMUX5_PARAMS>{ CTX_VAR_RO (CF_11RxPLCPSwitch::PLCPState, plcp_state ); CTX_VAR_RO (ulong, data_rate_kbps ); // data rate in kbps

public: …..public: REFERENCE_LOCAL_CONTEXT(TBB11aRxRateSel); STD_DEMUX5_CONSTRUCTOR(TBB11aRxRateSel) BIND_CONTEXT(CF_11RxPLCPSwitch::plcp_state, plcp_state) BIND_CONTEXT(CF_11aRxVector::data_rate_kbps, data_rate_kbps) {}

- Declarations are written in host language- Language is not specialized, so often verbose

- Hinders fast prototyping

15

SDR manual optimizations (LUT)struct _init_lut { void operator()(uchar (&lut)[256][128]) { int i,j,k;

uchar x, s, o; for ( i=0; i<256; i++) {

for ( j=0; j<128; j++) { x = (uchar)i; s = (uchar)j; o = 0; for ( k=0; k<8; k++) {

uchar o1 = (x ^ (s) ^ (s >> 3)) & 0x01;

s = (s >> 1) | (o1 << 6);

o = (o >> 1) | (o1 << 7);

x = x >> 1; } lut [i][j] = o; } } } }

Hand-written bit-fiddling code to create lookup

tables for specific computations that must run

very fast

?

16

Vectorization

removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

- Beneficial to process items in chunks

- But how large can chunks be?

17

My Own Frustrations Implemented several PHY algorithms in FPGA

Never been able to reuse them: Complexity of interfacing (timing and precision) was higher than

rewriting!

Implemented several PHY algorithms in Sora

Better reuse but still difficult Spent 2h figuring out which internal state variable I haven’t

initialized when borrowed a piece of code from other project.

I want tools to allow me to write reusable codeand incrementally build ever more complex systems!

18

Improving this situation New wireless programming platform

1. Code written in a high-level language2. Compiler deals with low-level code optimization3. Same code compiles on different platforms (not there just yet!)

Challenges1. Design PL abstractions that are intuitive and expressive2. Design efficient compilation schemes (to multiple platforms)

What is special about wireless1. … that affects abstractions: large degree of separation b/w data

and control2. … that affects compilation: need high-throughput stream

processing

19

Our Choice: Domain Specific Language What are domain-specific languages? Examples:

Make SQL

Benefits: Language design captures specifics of the task This enables compiler to optimize better

20

Why is wireless code special? Wireless = lots of signal processing Control vs data flow separation Data processing elements:

FFT/IFFT, Coding/Decoding, Scrambling/Descrambling Predictable execution and performance, independent of data

Control flow elements: Header processing, rate adaptation

21

Programming model

removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

22

How do we want code to look like? Example: IEEE 802.11a scrambler: S(x) = x7 + x4 + 1

Ziria:x <- take; do{

tmp := (scrmbl_st[3] ^ scrmbl_st[0]);scrmbl_st[0:5] := scrmbl_st[1:6];scrmbl_st[6] := tmp;y := x ^ tmp

}; emit (y)

23

What do we not want to optimize? We assume efficient DSP libraries:

FFT Viterbi/Turbo decoding

Same are used in many standards: WiFi, WiMax, LTE

This is readily available: FPGA (Xilinx, Altera) DSP (coprocessors) CPUs (Volk, Sora libraries, Spiral)

Most of PHY design is in connecting these blocks

24

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

25

Ziria: A 2-layer design Lower layer

Imperative C-like code for manipulating bits, bytes, arrays, etc. NB: You can plug-in any C function in this layer

Higher layer A monadic language for specifying and staging stream processors Enforces clean separation between control and data flow, clean state

semantics

Runtime implements low-level execution model

Monadic pipeline staging language facilitates aggressive compiler optimizations

26

A stream transformer t, of type:

ST T a b

Ziria: control-aware stream abstractions

t

inStream (a)

outStream (b)

c

inStream (a)

outStream (b)

outControl (v)

A stream computer c, of type:

ST (C v) a b

27

Staging a pipeline, in diagrams

c1

t1

t2

t3

C T

repeat { v <- (c1 >>> t1) ; t2 >>> t3 }

“Vertical composition” (along data path -- “arrows”)

“Horizontal composition” (along control path --

“monads”)

28

Running example:WiFi Scrambler

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

29

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in <rest of the code>

Start defining computational method

End defining computational method

30

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

Local variables

Types:- Bit- Array of

bits

Constants

31

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

Special-purpose computers:

32

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

Imperative (C/Matlab-like) code:

33

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

repeat

take doemi

t

yx

Computers and transformers

34

Whole program

Read >>> do_something >>> write

Reads and writes can come from RF, IP, file, dummy

35

Computation language primitives Define control flow Two groups:

Transformers Computers

36

Transformers Map:

let f(x : int) =

var y : int = 42;

y := y + 1;

return (x+y);

in

read >>> map f >>> write

Repeat

let f(x : int) =

x <- take;

if (x > 0) then

emit 1

in

read >>> repeat f >>> write

37

Computers While:

while (!crc > 0) {

x <- take;

do {crc = search(x);}

}

If-then-else:

if (rate == CR_12) then

emit enc12(x);

else

emit enc23(x);

Also: take, emit, for

38

Expression language – data processing Mix of C and Matlab Can be directly linked to any C function Subset of data types (mainly fixed point):

<basetype> ::= bit | bool | double | int | int8 | int16 | int32 | complex | complex16 | complex32 | struct TYPENAME | arr <basetype> | arr[INTEGER] <basetype> | arr[length(VARNAME)] <basetype>

39

Expression language - examplelet build_coeff(pcoeffs:arr[64] complex16, ave:int16, delta:int16) =

var th:int16;

th := ave - delta * 26; for i in [64-26, 26] { pcoeffs[i] := complex16{re=cos_int16(th);im=-sin_int16(th)}; th := th + delta }; th := th + delta; for i in [1,26] { pcoeffs[i] := complex16{re=cos_int16(th);im=-sin_int16(th)}; th := th + delta }in

Array (equivalent to [64-26:64])

Fixed-point complex numbers

External C function

Function

40

Libraries Ziria header: let external v_sub_complex32(c:arr complex32, a:arr[length(c)] complex32, b:arr[length(c)] complex32 ) : () in

C method:int __ext_v_add_complex32(struct complex32* c, int len, struct complex32* a, int __unused_2, struct complex32* b, int __unused_1)

Libraries (mainly linked to existing Sora libraries): SIMD instructions, FFT and Viterbi, fixed-point trigonometry,

visualisation

41

Frequently Asked Questions Why defining a new language? Why not use C/Matlab/<your favourite language>?

How do you share state? Why using let x = 20+3*z in instead ofx := 20 + 3*z;?

Why x <- take and not x := take?

42

Question: How do you implement teleport message?

Decoding

Frequency mixing

Equalizernew_freq

reconfigurationmessage

43

Answer: Use repeat to reinitialize in the new state

let processor() = var new_freq := X; // initializerepeat { ret <- ( freq_mixing(new_freq)

>>> equalizer >>> decoding

) ; do{ new_freq := ret } }

Freq_mixing

Decoding

repeatEqualize

r

44

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

45

How to write a compiler? Haskell + libraries

Parsing, code generation, flexible types, pattern matching

First version in <2 months Easily extendible Moral: compilers can be a useful tool!

46

Compilation – High-level view Expression language -> C code Computation language -> Execution model Numerous optimizations on the way:

Vectorization Lookup tables Conventional optimizations: Folding, inlining, …

47

Execution model: How to execute code?

removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

Runtime

tick()

process(x)

YIELD (data_val)

SKIP

DONE (control_val)

B1

B2process(x)

tick()

Q: Why do we need ticks?

Actions: Return values:

YIELD

DONE

A: Example: emit 1; emit 2; emit 3

49

Execution model - examplelet comp test1() =

repeat{

(x:int) <- take;

emit x;

}

in

tick()

SKIP

tick()

SKIP

tick()

YIELD(n)

read[int] >>> test1() >>> test1() >>> write[int]

process(n)

YIELD(n)

process(n)

process(n)DONE(n)

process(n)

YIELD(n)

50

Runtime main loop

L1: t.init() // init top-level componentL2: whatis := t.tick()L3: if (whatis == Yield b) then { put_buf(b); goto L2 } else if (whatis==Skip) then goto L2 else if (whatis==Done) then exit() else if (whatis==NeedInput) then { c = get_buf(); whatis := t.process(x); goto L3; }

In reality:• Very few function calls with a CPS-

based translation: every “process” function knows its continuation

• Optimizations: never tick components with trivial tick(), never generate process() for tick()-only components

• Only indirection is for bind: at different points in times, function pointers point to the correct “process” and “tick”

• Slightly different approach to input/output

51

How about performance?let comp test1() = repeat{ (x:int) <- take; emit x + 1; }in

read[int] >>> test1() >>> test1() >>> write[int]

(((read >>> let auto_map_6(x: int32) = x + 1 in {map auto_map_6}) >>> let auto_map_7(x: int32) = x + 1 in {map auto_map_7}) >>> write)

buf_getint32(pbuf_ctx, &__yv_tmp_ln10_7_buf);__yv_tmp_ln11_5_buf = auto_map_6_ln2_9(__yv_tmp_ln10_7_buf); __yv_tmp_ln12_3_buf = auto_map_7_ln2_10(__yv_tmp_ln11_5_buf); buf_putint32(pbuf_ctx, __yv_tmp_ln12_3_buf);

52

Type-preserving transformationslet block_VECTORIZED (u: unit) = var y: int; repeat let vect_up_wrap_46 () = var vect_ya_48: arr[4] int; (vect_xa_47 : arr[4] int) <- take1; __unused_174 <- times 4 (\vect_j_50. (x : int) <- return vect_xa_47[0*4+vect_j_50*1+0]; __unused_1 <- return y := x+1; return vect_ya_48[vect_j_50*1+0] := y); emit vect_ya_48 in vect_up_wrap_46 (tt)

let block_VECTORIZED (u: unit) = var y: int; repeat let vect_up_wrap_46 () = var vect_ya_48: arr[4] int; (vect_xa_47 : arr[4] int) <- take1; emit let __unused_174 = for vect_j_50 in 0, 4 { let x = vect_xa_47[0*4+vect_j_50*1+0] in let __unused_1 = y := x+1 in vect_ya_48[vect_j_50*1+0] := y } in vect_ya_48 in vect_up_wrap_46 (tt)

Dataflow graph iteration converted to tight loop! In this case we got x3

speedup

53

Transformations on Abstract Syntax Tree One of main benefit of compiler Computation optimizations:

Vectorization, inlining, convert to map, optimize tick/process, …

Expression optimization: Lookup tables, inlining,

calculate constant expressions, unroll loops, …

Tests: Array boundary checks

seq

EArrRead

TArr

x

EVal

VInt

0

EVal

VInt

10

y := x[0,10]

y:=

54

Flexible compiler design Example: x[0,length(x)] x

subarr_inline_step e | EArrRead evals estart (LILength n) <- unExp e , EVal (VInt 0) <- unExp estart , TArr (Literal m) _ <- info evals , n == m = rewrite evals

Easy to add new transformations

expressionOptimization function

If expression e is of type EArrRead on array evals == m with start estart == 0 and length == (LILength n)

evals => xestart => 0(LILength n) => length(x)

55

Vectorization Idea: batch processing over multiple data itemsrepeat {(x:int)<-take; emit x} repeat {(x:arr[64] int)<-take; emit x}

Modifications of the execution model: Possible since the execution model is not hardcoded in the code We need to respect the operational semantics

Benefits: LUT: bits -> bytes Lower overhead of the execution model (ticks/processes) Faster memcpy Better cache locality

Vectorization Challenges

56

ParseHeader

CRC(Len,Rate)

If rate == 6 Mbps

scrambler

½ encoder

interleaver

BPSK

2 bit1 bit

48 bit48 bit

1 bit1 complex

1 bit1 bit

1 bit1 bit

CRC

scrambler

¾ encoder

interleaver

64 QAM

4 bit3 bit

288 bit288 bit

6 bit1 complex

1 bit1 bit

1 bit1 bit

Len

Len

8 bit4 bit

48 bit48 bit

8 bit8 complex

8 bit8 bit

8 bit8 bit

32 bit24 bit

288 bit288 bit

12 bit2 complex

8 bit8 bit

8 bit8 bit

32 bit24 bit

288 bit288 bit

12 bit2 complex

8 bit8 bit

8 bit216 bit

8 bit4 bit

48 bit48 bit

8 bit8 complex

8 bit8 bit

8 bit24 bit

Look-up Table (LUT) Optimizations Key optimization for Sora TX Identify block of expressions that

transform data has limited input and output size

Replace it with a LUT Similar to FPGA compilation

Especially beneficial for bit operations

58

LUT Optimizations (by example)let comp scrambler() =  var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1};   var tmp,y: bit;    repeat {      (x:bit) <- take;      do {        tmp := (scrmbl_st[3] ^ scrmbl_st[0]);        scrmbl_st[0:5] := scrmbl_st[1:6];        scrmbl_st[6] := tmp;        y := x ^ tmp      };

      emit (y)  }

let comp v_scrambler () =  var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1};   var tmp,y: bit;

  var vect_ya_26: arr[8] bit;  let auto_map_71(vect_xa_25: arr[8] bit) =    LUT for vect_j_28 in 0, 8 {          vect_ya_26[vect_j_28] := tmp := scrmbl_st[3]^scrmbl_st[0];             scrmbl_st[0:+6] := scrmbl_st[1:+6];             scrmbl_st[6] := tmp;             y := vect_xa_25[0*8+vect_j_28]^tmp;             return y        };        return vect_ya_26  in map auto_map_71

Vectorization

Automatic lookup-table-compilationInput-vars = scrmbl_st, vect_xa_25 = 15 bitsOutput-vars = vect_ya_26, scrmbl_st = 2 bytesIDEA: precompile to LUT of 2^15 * 2 = 64K

59

Question How to implement bit permutation as LUT?(idea from SORA):

out_arr := perm({0,2,3,1}, in_arr);in_arr = {1,2,3,4} out_arr = {1,3,4,2}

Hint: permutation indices are constants!

60

Answerlet perm(p : arr int, iarr : arr bit) =

var oarr : arr[length(p)] bit;

var oarrtmp : arr[length(p)] bit;

var iarr1 : arr[8] bit;

unroll for j in [0,length(p)/8] {

let p1 = p[j*8,8];

iarr1 := iarr[j*8,8];

perm8(p1,iarr1,oarrtmp)

oarr := v_or(oarr,oarrtmp);

}

return oarr;

in

let perm8(p : arr[8] int, iarr : arr[8] bit, oarr : arr bit) =

for i in [0,8] {

oarr[p[i]] := iarr[i];

}

in

iarr

oarr

LUT size: 2^8 * sizeof(oarr)

Constants!

out_arr := perm({0,2,3,1}, in_arr);

61

Supporting different HW architectures Work in progress… SMP vs FPGA vs ASIC Pipeline and data parallelism SIMD, coprocessors (DSP or ASIC)

62

Pipeline parallelismofdm |>>>| decode >>> packetize

ofdm >>> write(q1) >>> read(q1) >>> decode >>> packetize

ofdm >>> write(q1)

Thread 1, pin to Core 1

read(q1) >>> decode >>> packetize

Thread 2, pin to Core 2

Sync queue

63

Code Examples

64

Performance evaluation

Msample*/sec (sample = 32bits)

SORA Ziria Wifi

RX 6Mbps 164 156 40

RX 12Mbps 125 100 40

RX 24Mbps 81 67 40

RX 48Mbps 61 52 40

RX CCA 289 163 40

Mbps/sec SORA Ziria Wifi

TX 6Mbps 54 51 6

TX 12Mbps 98 45 12

TX 24Mbps 145 53 24

TX 48Mbps 231 70 48

- WiFi RX and TX measurements

65

Real-time LTE-like demo

66

Status Released to GitHub under Apache 2.0 WiFi implementation included in release Currently supports SORA platform Essential dependency on CPU/SIMD Looking into porting to other CPU-based SDRs

67

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

68

Motivation Wireless architecture and design are fragmented:

EE considers PHY and parts of MAC CS considers MAC and above Opportunities for synergies missed: Q: How to change PHY to allow better/simpler network designs?

69

Examples: HARQ and/or rateless codes: don’t care about rate adaptation, just keep sending

Correlation and detection: detection takes time => huge MAC overheads and terrible efficiency

Channel impulse response and localization: more precise location information

70

Conventional WiFi design PHY: standardized and cannot be changed

data pipe, correct bits getting in and out

MAC: innovation happen Conventional: CSMA Many alternatives

71

Conventional cellular design PHY and MAC are standardized

Several modes of operations but none can be changed by applications

IP layer and above: innovations can happen

72

In reality: We have standard blocks:

Correlators, scramblers, coders/decoders, interleavers, FFT/IFFT

Why not allow building network from these standard blocks? Respect certain ground rules Allow innovations

73

How to design a wireless transceiver? Challenges: performance, complexity, (cost/patents)

CDMA: Uses pseudo-random code against multi-path Not as complicated to implement as OFDM based systems Difficult to equalise the overall wide spectrum

OFDM: Uses subcarriers to combat multi-path combat multipath with greater robustness and less complexity. OFDMA can achieve higher spectral efficiency with MIMO than CDMA

Rule of a thumb: For data rates >= 10 Mbps use OFDM

74

OFDM OFDM used in most of the contemporary PHYs: WiFi, LTE, WiMax, 60 GHz, UWB

OFDMA is OFDM variant used in cellular (LTE, WiMax): Multiple users sharing the same OFDM symbol MAC scheduling at sub-symbol level Blurred distinction between MAC and PHY

75

WiFi TX Overview OFDM transmitter (@56Mbps):emits createSTSinTime(); emits createLTSinTime();crc216(h.len) >>> scrambler() >>> encode34() >>> interleaver_m64qam() >>> modulate_64qam() >>> add_pilots() >>> ifft() Preamble for detection and channel estimation

CRC (with padding) to check for errors Scrambler prevents high peaks Interleaver decouples errors

76

IFFT example

77

OFDM Symbol Add pilots, perform IFFT and add cyclic prefix

78

Channel impulse response

79

Effect on OFDM symbol in time

80

Effect on OFDM symbol in frequency

81

Channel Estimation

82

Channel inversion/equalization

83

How to deal with multi-path? Add cyclic prefix

84

Missing Cyclic Prefix

85

Scrambler

Avoids large peak-to-avg power ratios

86

WiFi RX Overviewlet comp receiver() =seq{ (removeDC() >>> t<-detectSTS()) ; params <- ChannelEstimation() ; removeCP() >>> FFT() >>> ChannelEqualization(params) >>> PilotTrack() >>> RemovePilots() >>> receiveBits() }in

87

Pilots Channel estimation changes across OFDM symbols Channel changes Drift in oscillators between sender and receiver

Pilots are used for channel re-estimation Similar as initial channel estimation Interpolation for data points

88

Carrier Sensing and Synchronization Find where packet starts Accurate timing needed for the rest of RX Also used to estimate CFO Also used in carrier sensing

89

Preamble

90

Detection using correlation Correlate for a known preamble

How do we implement this in CPU/SIMD?

91

Detection

92

Performance of Detector

93

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

94

Conclusions Wireless innovations will happen at intersections of PHY and MAC levels

We need prototypes and test-beds to evaluate ideas

PHY programming in its infancy Difficult, limited portability and scalability Steep learning curve, difficult to compare and extend previous works

Wireless programming is easy and fun – go for it!http://research.microsoft.com/en-us/projects/

ziria/