Rateless Wireless Networking Decoder Mikhail Volkov Edison Achelengwa Minjie Chen.

Rateless Wireless Networking Decoder

Mikhail VolkovEdison Achelengwa

Minjie Chen

Cortex: a rateless wireless system

• Very recent work here at CSAIL (Perry, 2011)• Use a novel rateless code called spinal code• Encoder and decoder agree on a seed s0, a

hash function h and an IQ constellation mapping

Spinal Encoder

• Wish to transmit a message M = m1m2 ... mn

• Break the message into k-bit segments Mi

• Apply h to generate a spine

Spinal Encoder

• Encoder performs passes over the spine, each time generating new constellation points

• These constellation points are sent across an AWGN channel

Spinal Decoder

• Decoder knows s0 so it can generate the 2k possible candidate symbols s1 using h

• Each time decoder receives symbol y it keeps the B best symbols from 2k candidates using ML

• The transmitted message is estimated as the one with the lowest ML cost

Spinal Decoder

Objectives

• Implement decoder on an FPGA• Evaluate feasibility of Cortex in a real

communications system• Identify key performance bottleneck and

develop a clear strategy for developing a practical Cortex system

Micro-architecture• Interface

• Takes stream of constellation symbols as input• Outputs a message (192-bit packet)

• Decoding Stages• Code Enumeration• Add-Compare-Select• Suggestion Update• Spine Evaluator Update• Get output message

Decoderrc

v (p

ut)

Sen

d_st

at

Symbol Mapper f(*)

Spine EvaluatorPuncturing

Scheduler

Inpu

t bi

t S

trea

ms

I

Q

backtrackMemmkSalsa, h(*)

seeding parameters

curr_schedule

curr_suggcosts

schedule params

getOutMsggetOutMsg

updateSymQ

out_

msg

(ge

t)

mkDecoder

Sortingmodule

doEnumerate

doACS

suggupd

outbitsQ

getSchedule

Schedule getput

EnumReq

Vect(B*2^k, EnumResp)

Sym

bol

Msg

upda

teTr

ee

getMsg

getBestMsgs

put

get

Vect(B*2^k, MarkedCost)

Vect(B, MarkedCost)

Vect(B, MarkedCost)

Vect(B, Mark)

Msg

toACSQ

get

evalupd

Micro-architecture• Sub-modules

• Puncturing Scheduler• Spine Evaluator• Sorter• Backtrack Memory

Decoderrc

v (p

ut)

Sen

d_st

at

Symbol Mapper f(*)

Spine EvaluatorPuncturing

Scheduler

Inpu

t bi

t S

trea

ms

I

Q

backtrackMemmkSalsa, h(*)

seeding parameters

curr_schedule

curr_suggcosts

schedule params

getOutMsggetOutMsg

updateSymQ

out_

msg

(ge

t)

mkDecoder

Sortingmodule

doEnumerate

doACS

suggupd

outbitsQ

getSchedule

Schedule getput

EnumReq

Vect(B*2^k, EnumResp)

Sym

bol

Msg

upda

teTr

ee

getMsg

getBestMsgs

put

get

Vect(B*2^k, MarkedCost)

Vect(B, MarkedCost)

Vect(B, MarkedCost)

Vect(B, Mark)

Msg

toACSQ

get

evalupd

Practical Salsa Implementation

• In practice we cannot have infinite precision floating point numbers

• Salsa produces two outputs: a 64-bit spine and 512-bit arrays of symbol bits

Development and Testing

• 3 point development and testing plan• Critical to our success with 3 people under

time constraintsStep 1: Develop Decoder backbone with dummy

Sorter and Spine Evaluator. Develop Sorter and Spine Evaluator independently.

- Sorter tested with MATLAB.- Spine Evaluator (and Salsa) tested with Python.


Step 2: Integrate Decoder with Sorter and Spine Evaluator. Ensure correctness at the architectural level:

- Modules instantiate correctly- Rules fire as expected, no deadlocks etc.- Timing is correct- Bits flowing end-to-end


Step 3: Ensure correctness at the semantic level, i.e. “bit-by-bit debugging”

in out

AWGNChannel

PythonEncoder ou

t

Python Decoder

Bluespec Decoder

- Encode string with Python encoder to produce symbols- Decode symbols and compare results


• Finally, the algorithm was tested by adding noise to the transmitted symbols

• Strictly not our concern, as long as our implementation agreed with the source code

• Algorithm worked very well• Actually “outdid” the reference code at one

point: the Python code crashed but our decoder correctly decoded the message!

Performance Analysis – FPGA frequency

• The synthesized FPGA maximum frequency is 98.035 MHz.

• Different Salsas gives the same FPGA frequency .

Performance Analysis – Frequency, Latency, Throughput

Performance Analysis - Area

• Sorter and SpineEvaluator take the most area

Performance Analysis - Area

• Our implementation actually fits on the FPGA. (roughly taking 30% of the total area)

• Different Salsa implementation don’t vary too much on device utilization.

Performance Analysis - Code• The total lines of source code was 3104. Of these, the total

lines of test code was 1135 (36.5%) and non-test code was 1969 (63.4%).

How much better can we do?• We used a naive O(n2) algorithm for the sorter module. We

might be able to use other algorithm to reduce the cycle step from 149 to 32 in the best case, which brings a 5 times better performance and improve the bit rate ot 7.5Mbits/s.

• Given the current space requirement of Salsa, we can have B (B=4) of seperate hashing modules running in parallel with each other. In this case, we can have 4 times of better performance and improve the bit rates to 7.5*4 = 30 Mbits/s.

• Suppose we have sufficient area on the FPGA, we will be able to have B*2k = 32 of hash modules running in parallel with each other . This will bring 32 times of better performance and improve the bit rates to 7.5*32 = 240Mbits/s.

Rateless Wireless Networking Decoder Mikhail Volkov Edison Achelengwa Minjie Chen.

Documents

Transcript of Rateless Wireless Networking Decoder Mikhail Volkov Edison Achelengwa Minjie Chen.