Soft Decoding of Reed-Solomon Codes -...

28
Advanced Hardware Architecture for Soft Decoding Reed-Solomon Codes Stefan Scholl , Norbert Wehn Microelectronic Systems Design Research Group TU Kaiserslautern, Germany

Transcript of Soft Decoding of Reed-Solomon Codes -...

Advanced Hardware Architecture for

Soft Decoding Reed-Solomon Codes

Stefan Scholl, Norbert Wehn

Microelectronic Systems Design Research Group

TU Kaiserslautern, Germany

Overview

• Soft decoding decoding for the RS(255,239)

• New hardware architecture

• Goal: large FER gain (over hard decision decoding)

• Algorithm based on information set decoding

• Complexity evaluation on a Virtex 5 FPGA

2

Motivation RS / BCH Decoder Hardware

NASA / CCSDS

wireless

VDSL

wired storage

Optical (G.709)

3

Widely used code: RS(255,239) or its shortened versions

Decoding Algorithms for Reed-Solomon

Hard Decoding Soft Decoding

4

Progress in microelectronicsallows for more complexity today!

• standard method• algebraic decoding• complexity very low:

first chip implementations inthe 1970/80s

Algorithm:

Decoding Algorithms for Reed-Solomon

Hard Decoding Soft Decoding

5

Progress in microelectronicsallows for more complexity today!

Algorithms:

Chase Decoding

Information Set Decoding

Adaptive Belief Propagation

Kötter-Vardy

Improved error correctionpossible gain: up to 3 dB(depends on length and coderate)

• standard method• algebraic decoding• complexity very low:

first chip implementations inthe 1970/80s

Algorithm:

Decoding Algorithms for Reed-Solomon

Hard Decoding Soft Decoding

6

Progress in microelectronicsallows for more complexity today!

Algorithms:

Chase Decoding

Information Set Decoding

Adaptive Belief Propagation

Kötter-Vardy

Improved error correctionpossible gain: up to 3 dB(depends on length and coderate)

• standard method• algebraic decoding• complexity very low:

first chip implementations inthe 1970/80s

Algorithm:

We consider the widely used RS(255,239)

but RS(255,239) seems to be challenging

“medium gain”hardware0.5 – 1 dB

State-of-the-art Soft Decoder Hardware

7

Real & complete hardware implementations for RS(255,239)

Paper Year Algorithm Gain (over HDD)

An (PhD thesis, MIT) 2010 Low complexity Chase

0.45 dB

Hsu et al (ESSCIRC) 2011 Chase 0.35 db

Garcia-Herrero et al (CSSP)

2011 Low complexity Chase

0.3 dB

Kan et al (ISTC) 2008 Adaptive BP 0.75 dB

Heloir et al (NEWCAS) 2012 Stochastic Chase 0.7 dB

Scholl et al (DATE) 2014 Information set 0.75 dB

“low gain” hardware<0.5 dB

State-of-the-art Hardware Implementations

8

Hard decision decoding

“low gain”<0.5 dB

State-of-the-art Hardware Implementations

9

Hard decision decoding

“low gain”<0.5 dB

“medium gain”0.5 - 1 dB

State-of-the-art Hardware Implementations

10

Hard decision decoding

“low gain”<0.5 dB

“medium gain”0.5 - 1 dB

“high gain”> 1 dB

Not yetinvestigated!

Literature shows:up to 2 dB gain should be possible

0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0

Implemented Algorithm*

11

1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability

H =

Received bits

*A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of theadaptive parity check matrix based soft-decision decoding algorithm, 2004.

most reliable least reliable

Binary image

Information set decoding approach

0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0

Implemented Algorithm*

12

1 1 0 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0 0 0 1

1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability

H =

Received bits

*A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of theadaptive parity check matrix based soft-decision decoding algorithm, 2004.

most reliable least reliable

Diagonalizedby Gaussianelimination

Binary image

Information set decoding approach

0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0

Implemented Algorithm*

13

1 1 0 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0 0 0 1

1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability

H =

Received bits

001000

syndrome

Syndrome weight:

Small:Only errors in least rel. part

Large:Min. 1 errors in most rel part

*A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of theadaptive parity check matrix based soft-decision decoding algorithm, 2004.

most reliable least reliable

Diagonalizedby Gaussianelimination

Binary image

Information set decoding approach

0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0

Implemented Algorithm*

14

1 1 0 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0 0 0 1

1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability

H =

Received bits

001000

syndrome

Syndrome weight:

Small:Only errors in least rel. part

Large:Min. 1 errors in most rel part

Order 1 processing: tentatively flip each most reliable bit (here: 1912)Order 2 processing: tentatively flip all combinations of 2 most reliable bits

(~2 million cases)

*A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of theadaptive parity check matrix based soft-decision decoding algorithm, 2004.

Can be seen as a low complexity variant of ordered-statistics decoding

most reliable least reliable

Diagonalizedby Gaussianelimination

Binary image

Information set decoding approach

Algorithm Improvements

We add further features for improvement (mostly from other literature):

• Use a hard decision decoder (counters potential error floor)

• Use three differently diagonalized parity check matrices (improves FER)

• Partial overlapping of diagonalized parts

• allows for sophisticated architecture (complexity reduction)

• Restrict order 2 processing to “fair” reliable bits (250 out of 1912)

• Need to determine additional group: fair reliable (besides least and most)

• Large reduction of processings (factor 60 less)

• Use approximative reliability sorting to enable parallelization

(higher speed)

15

Overall loss due to complexity reduction: < 0.1 dB

Our New Hardware Architecture

16

Implementation on Virtex 5 FPGA

Input: 2040 bit LLRs8 in parallelQuantization: 6 bits

output: 2040 bits(hard out)8 in parallel

Our Hardware Architecture

17

Sorting

Finds low and fair reliable bits

Finds 378 lowest out of 2040 LLRs

Shift register based insertion sort

8 sorters parallel (approximative sorting)

Stores bit positions in four memories

Our Hardware Architecture

18

Gaussian Elimination /Diagonalization:

Original matrix stored in memory

Diagonalization “on the fly”

Diagonalizaton “column wise”

2 phases: setup & elimination

Saves ~70% hardware over state-of-the-art diagonalizations (e.g. systolic arrays)

Three diagonalizations: exploit overlapping

+ +

+

+

P

P

P

P: Fixed pivot positions!

columnoriginalmatrix

column eliminated matrix

Pipelined array eliminator

Our Hardware Architecture

19

Correction Unit

Performs order 1 and 2 processing

Parallelized order 2 proc.

In 1 clock cycle: 1x order 1

6x order 2

3 instances (for 3 matrices)

Selects best results for output

Our Hardware Architecture

20

Syndrome Calculation:

Required: syndrome of the diagonalized matrix

Strategy:

First: calculate syndrome using original matrix

Second: “diagonalize” syndrome in the Gaussian Elimination

Advantage: allows use of Galois field operations (much faster)

FPGA Implementations

21

Kan et al Scholl et al Heloir et al THIS WORK

Algorithm Adaptive BP Information Set Stoch. Chase Information Set

Chip Stratix II Virtex 5 Virtex 5 Virtex 5

Flipflops n/a 42,000 143,000 70,200

Look-Up Tables 43,700 13,700 117,000 32,400

Throughput 4 Mbit/s 800 Mbit/s 50 Mbit/s 300 Mbit/s

Communicationsgain over HDD

0.75 dB 0.75 dB 0.7 dB 1.3 dB

Our new

architectureM. Kan et al., Hardware implementation of soft-decision decoding for Reed-Solomon code.

In Proc. 5th Int. Turbo Codes and Related Topics Symp, 2008.

S. Scholl and N. Wehn, “Hardware Implementation of a Reed-Solomon

Soft Decoder based on Information Set Decoding, DATE ’14, 2014.

R. Heloir, C. Leroux, S. Hemati, M. Arzel, and W.J.Gross.

Stochastic chase decoder for reed-solomon codes. IEEE NEWCAS 2012

State-of-the-art soft decoder RS(255,239), gain > 0.5 dB

FPGA Implementations

22

Kan et al Scholl et al Heloir et al THIS WORK

Algorithm Adaptive BP Information Set Stoch. Chase Information Set

Chip Stratix II Virtex 5 Virtex 5 Virtex 5

Flipflops n/a 42,000 143,000 70,200

Look-Up Tables 43,700 13,700 117,000 32,400

Throughput 4 Mbit/s 800 Mbit/s 50 Mbit/s 300 Mbit/s

Communicationsgain over HDD

0.75 dB 0.75 dB 0.7 dB 1.3 dB

Our new

architectureM. Kan et al., Hardware implementation of soft-decision decoding for Reed-Solomon code.

In Proc. 5th Int. Turbo Codes and Related Topics Symp, 2008.

S. Scholl and N. Wehn, “Hardware Implementation of a Reed-Solomon

Soft Decoder based on Information Set Decoding, DATE ’14, 2014.

R. Heloir, C. Leroux, S. Hemati, M. Arzel, and W.J.Gross.

Stochastic chase decoder for reed-solomon codes. IEEE NEWCAS 2012

State-of-the-art soft decoder RS(255,239), gain > 0.5 dB

FPGA Implementations

23

Kan et al Scholl et al Heloir et al THIS WORK

Algorithm Adaptive BP Information Set Stoch. Chase Information Set

Chip Stratix II Virtex 5 Virtex 5 Virtex 5

Flipflops n/a 42,000 143,000 70,200

Look-Up Tables 43,700 13,700 117,000 32,400

Throughput 4 Mbit/s 800 Mbit/s 50 Mbit/s 300 Mbit/s

Communicationsgain over HDD

0.75 dB 0.75 dB 0.7 dB 1.3 dB

Our new

architectureM. Kan et al., Hardware implementation of soft-decision decoding for Reed-Solomon code.

In Proc. 5th Int. Turbo Codes and Related Topics Symp, 2008.

S. Scholl and N. Wehn, “Hardware Implementation of a Reed-Solomon

Soft Decoder based on Information Set Decoding, DATE ’14, 2014.

R. Heloir, C. Leroux, S. Hemati, M. Arzel, and W.J.Gross.

Stochastic chase decoder for reed-solomon codes. IEEE NEWCAS 2012

State-of-the-art soft decoder RS(255,239), gain > 0.5 dB

Comparison FER

24

This work

Summary & Outlook

Proposed new RS soft decoder hardware for RS(255,239)

Based on information set decoding

Implementation with currently best FER: gain 1.3 dB over HDD

New “High gain” architecture, besides low & medium gain

Acceptable complexity

Improving implementation efficiency

Architectures for specific application’s requirements

Approach applicable to every linear code

Summary

Future Challenges

25

Thank you for your attention!

Questions?

26

Our new Binary Gaussian Elimination

• Basic operation: adding rows onto other rows to form unit columns

• For our hardware: Two Phase Approach

1. Setup: configures addition patterns

2. Elimination: performs actual elimination

• Architecture: Column by column processing with pipelined array

27

Columns fromoriginal matrix

Columns of eliminated matrix

+ +

+

+

P

P

P

P: Fixed pivot positions!

S. Scholl, C. Stumm, and N. Wehn. Hardware Implementations of Gaussian Elimination over GF(2) for Channel Decoding Algorithms. IEEE AFRICON 2013.

Comparison, 128 x 2040 matrix

28

Architecture Look-Up-Tables Flipflops Throughput

SMITH* 780k* 260k*

Systolic array 82k 99k 219k matrices / s

proposed 17k 33k 272k matrices / s

Design Example: Reed-Solomon (255,239) Code:

Binary Matrix Size: 128 x 2040

Implementation on a Xilinx FPGA Chip (Virtex 7)

* estimated +25% increase-67% saving-80% saving

Efficient Gaussian elimination

is the key for efficient soft decoding!