Outline - University of California, Berkeleybwrcs.eecs.berkeley.edu/Classes/EE290C_S04/... ·...

1

EE290C - Spring 2004Advanced Topics in Circuit DesignHigh-Speed Electrical Interfaces

Lecture 18Components

Reed-Solomon, LDPC DecodersBorivoje NikolicMarch 18, 2004.

2

Outline

Galois fields algebraReed-Solomon CodesLow-density parity-check codes

2

3

Need a Bit of Algebra

Coding operations are usually done modulo-2For circuit designers: an XOR is modulo-2 addition

Finite fields, GF(2n)Sets of numbers generated using generator polynomialsAny operation that involves numbers from the field produces results that are also in the field

Example: x = 2x3 + x2 + 1 = 23 + 22 + 1 = 1101(x3 + x2 + 1) (x4 + x2 + 1) = x7 + x6 + x5 + x4 + x4 + x3 +x2 +x2 +1= x7 + x6 + x5 + x3 +1

4

Generator Polynomials

Generator polynomials (GP) are primitive (irreducible).E.g. x4 + 1 is not primitive - reduces to x + 1 and x3+x2+x+1Example of a GP

x2 + x + 1x3 + x + 1x3 + x2 + 1

GP describes the field of the order n-1PRBS is an example sequence generated by a GP

3

5

Galois Fields

E.g. GF(23)GP: x3 + x + 1 = 0Chose an element α (primitive root)α = x = 010 (2); α2^n-1 = α0 = 1

α1 = x = 010 (2)α2 = x2 = 100 (4)α3 = x3 = x + 1 = 011 (3)α4 = αα3= x(x + 1) = 110 (6)α5 = αα4 = x(x2 + x) = (x + 1) + x2 = 111 (7)α6 = α2α4= x2(x2 + x) = x2 + 1 = 101 (5)α7 = αα6= x(x2 + 1) = (x + 1) + x = 001 (1)

6

Operations in Galois Fields

Addition is just an XORα + α3 = 010 + 011 = 001, orα + α3 = x + (x + 1) = (x + x) + 1 = 001

Multiplicationα4α5 = α9 = α7α2 = 1α2 = α2 = 100 orα4α5 = (x2 + x)(x2 + x +1) = x4 + x3 + x2 + x3 + x2 + x = x4 + x= xx3 + x = x(x + 1) + x = x2 = 100

4

7

Reed-Solomon Codes

Invented ~1960A special case of BCH (Bose-Chaudhury-Hocquenghem) codesStarts with bits, di

And replaces with symbols, pi

Symbols are typically 8-bit (don’t have to be)

01

0=α∑

−

=

m

i

iid

01

0=α∑

−

=

m

i

iip

8

Reed-Solomon CodesRS(n, k) code over GF(2m)

2m – 1 symbolsk user symbols

Minimum distance n – k + 1Can correct up to t = (n – k)/2 errors E.g. RS(255,239) over GF(28) code is used in long-haul optical communications

16 parity bytes added to 239 user bytes (7% overhead)5.5dB coding gain @ BER = 10-12

This coding gain moves BER from 10-4 to 10-15

or from 10-5 to 10-24 (at given SNR)Can correct bursts up to 64b16-way interleaved code can correct 1024-b bursts

5

9

Reed-Solomon Encoding

Assume: E.g. GF(23), GP: x3 + x + 1 = 0, 3-bit symbolsMessage: 3, 4, 6, 0, 1, 1, and add p0 as a check symbol3, 4, 6, 0, 1, 1, p0 = α3, α2, α4, 0, α0, α0, p0

The encoded word must satisfy:

α3α6 + α2α5 + α4α4 + 0α3 + α0 α2 + α0 α1 + p0 α0 = 0α2 + α0 + α2 + 0 + α2 + α1 + p0 = 0Therefore p0 = α5

01

0=α∑

−

=

m

i

iip

10

Reed-Solomon Decoding

If no errors – the polynomial evaluates to 0

If there is an error at e.g. position α2:α3, α2, α4, 0, α0, (α0 +e), α5

Syndrome is α3α6 + α2α5 + α4α4 + 0α3 + (α0+e) α2 + α0 α1 + p0 α0

= 0 + eα2

The error can be detected, but can’t be correctedNeed to add another symbol

6

11


New constraintfor j = 0, 1Our example (with one symbol less):j = 0: α3 + α2 + α4 + 0 + α0 + p1 + p0 = 0j = 1: α3α6 + α2α5 + α4α4 + 0α3 + α0 α2 + p1α1 + p0α0 = 0p1 + p0 = 0; p1α + p0 = 0, therefore, p1 = p0 = 1 = α0

The receiver calculates two syndromes, for j = 0 and j = 1

01

0

=∑−

=

×m

i

jiip α

∑−

=

×=1

0

m

i

jiij pS α

12


Adding an error e = αOur message α3, α2, α4, 0, α0, (α0 +e), α5

Becomes α3, α2, α4, 0, α0, α3, α5

S0 = α - error magnitudeS1 = α2 - eαk, k is the positionα2 = S1/S0

In this example S1/S0 = α1 , k = 1The error is corrected by adding S0 to the received symbol at position k.This can be extended to correct for multiple errors

7

13

Correcting Multiple Errors

To correct for 2 symbol errors, need 4 checks: 2 positions + 2 error magnitudesDecoder evaluates 4 syndromesBut it is difficult to work out the error magnitudes and error locations from them

Simultaneous equations in GF (so called “key equations”)

Alternatives are trial-and-error solutions, known as Berlekamp-Massey algorithm and Eucledean algorithm

14

Practical RS Decoding

1. Syndrome computation

2. Key equation

Λj – error locator, Ωj – error evaluator polynomials3. Compute error locations and error values from Λj, Ωj

Error locations – Chien’s searchError values – Forney’s algorithm

∑−

=

×=1

0

m

i

jiij pS α

( )mjjjS αmodΩ⋅Λ=

8

15

Decoder Design

16

DVD RS CodeProduct RS code

Chang, JSSC 02/01

9

17

Decoding Sequence

18

Syndrome Calculation

Syndrome cell

Syndrome calculator cell

∑−

=

×=1

0

m

i

jiij pS α

Chang, JSSC 02/01

Syndrome calculation is straightforward

10

19

Solving the Key Equation

Iterative solutions using Berlekamp-Massey or Euclid’s algorithm

Both require GF multiplication and division in each iteration

Euclidean algorithm calculates the greatest common divisor among two polynomials (Sj and α2)

Solves the key equation by iterative polynomial division and multiplication

Division-free Euclidean algorithm [Shao, CICC’85]Inversionless Berlekamp-Massey algorithm [Shayan, TCom’93]

Replaces polynomial division with cross-multiplication

20

Berlekamp-Massey Computation

Song, ISSCC’02

11

21

Modified Euclidean Computation

Song, ISSCC’02

22

Modified Euclidean Algorithm

Song, JSSC 11/02

12

23

Chien Search

Chien search evaluates polynomials

Seki, CICC’01

24

Why Use FEC?

Relax component requirementsImprove distance

3dB can double the reach of DSL

Increase capacity (rate)3dB can improve the rate by 50% in QPSK

Reduce system costImprove system quality

Only total system evaluation can determine if it is worth to implement FEC in high-speed links

13

25

Example RS Decoders

Song, JSSC 11/02, 10Gb/s, 4 x 2.5Gb/s 83MHz, 0.16µm CMOS, 340mWSong, JSSC 11/02, 40Gb/s, 112MHz, 0.16µm CMOS, 360mWSeki, CICC’01 10Gb/s, 0.18µm CMOS, 350mWRS decoders are typically a part of bigger chips (10mm x 10mm) that include framing, Mux/Demux, 1-10W

26

OC-192 RS CoDec

Seki, CICC’01

14

27

Performance of RS Decoders

Azadet, JSSC 3/02

28

Parity Checks

(7, 4) code, 4 user bits and 3 parity checksd6, d5, d4, d3, p2, p1, p0

Parity checks:p0 = d3 + d5 + d6

p1 = d3 + d4 + d5

p2 = d3 + d4 + d6

The code has a distance of 3, therefore can correct 1 error

15

29

Parity Checks

Parity check equationsd6 + d5 + d3 + p0 = 0

d5 + d4 + d3 + p1 = 0d6 + d5 + d3 + p2 = 0

Parity-check matrix1 1 0 1 0 0 10 1 1 1 0 1 01 1 0 1 1 0 0

30

Low-Density Parity-Check Code

Parity codes based on sparse H matricesDiscovered in 1960, rediscovered in 1990sExceptional error correcting capability

Within 0.01dB of Shannon bound

Efficient decoding based on message passingPart of almost every new communications standard, like ADSL2+, 10GBase-T

16

31

Message Passing Analogy (Trellis)

How many people are there?

1 2 3 4 5 6 7 88 7 6 5 4 3 2 1

32

Message Passing Analogy (Tree)

9 8 7

9

8

9

1 2 3 4

3

2

1

2

1

Output @ each node = (Marginalized sum of inputs + 1 )

8

6

7

17

33

LDPC: Overview

X1 X2 X3 X4 X5 X6 X7 XN

…

…

Parity Checks

Bits

• LDPC representation by either parity check matrix or the bi-partite graph.

• Message passing: Bit-to-Check / Check-to-Bit

• Total Number of Edges: 18432

1 0 0 0 …. 0 0 1 0 0 … 1 0 1 1 0 … 0 1 0 0 1 … 0 ….. ….. ….. 0 0 1 0 … 1

Wc = 512

Wr = 4608

Column Weight = 4

Row Wt. = 36

[GALLAGER R. G., IRE Trans. Info. Theory, Vol. 8(1962) p. 21]

34

Parallel LDPC Decoder Architecture

1

2

PEc1

PEv1 PEv2…

.

.

.

PEv3 PEv4

Interconnect Fabric

PEvN

PEc2 PEcM…

Fully parallel structuree.g. satellite receiver:64,000 variable node processing elements, PEv

Variable number check node processing elements, PEc

High throughput; low powerRouting complexity [A. Blanksby and C. J. Howland, JSSC 2002]

18

35

Parallel LDPC Decoder Architecture1Gb/s, 690mW, 1024-bit blocks, rate -1/2Routing density 50%, 50mm2 chip

Blanksby, JSSC 3/02

36

Bit-to-Check Message Computation

C4

Ri,1 Qi,4

Bi

C1

C2

C3

Ri,3 Ri,2

Ri,4

Qi,4 = Ri,1 + Ri,2 + Ri,3

Prior Prob

D

D

D

D

Ri,j1

Ri,j2

Ri,j3

Ri,j4

Qi,j1

Qi,j2

Qi,j3

Qi,j4

Parallel Tree-Adder Structure

19

37

Check-to-Bit Computations

Bi …

Cj

Q36,j Q3,j Q2,j

R1,j

B2 B3 B36

Q1,j

R1,j = f(Q2,j , Q3,j ,. . . , Q36,j )

( ) ii

i

NQMinQ

QQQf

•≈ ∏ )sgn(

,...,, 21

Ri,j Qi,j

36-Cycle Delay

LUT

LUT

Φ Φ Φ/36

Φ Φ

Recursive-Serial Adder Structure

38

LDPC Code Performance

BER performance of LDPC depends onGraph girthCode expansion + block sizeHamming distance

“Random” parity-check matrices usually achieve good performance

The interconnect network is random, tooA 1024-bit LDPC decoder in 0.15µm occupies 7mm x 7mm with 50% density [Blanksby, Howland, JSSC’02]

Structured codes achieve good performance with structured interconnect

Permutation codes, Ramanujan and Cayley graphs

20

39

Structured LDPCs

Bit node groups

Check node groups

Construction based on Ramanujan graphs allows for hierarchical decomposition and good performance

40

Serial LDPC Decoder Architecture

1

2

3

4

5

6

7

8

A

B

C

time

...

A

B

C

1

2

3

4

5

6

7

8

Latency dependent on total number of nodesMessages are stored in SRAM

Large memory requirement

Natural structure for microprocessors, DSPs, etc.Parallelizing computation with limited PEs

PECPEV SRAM

[G. Al-Rawi, J. Cioffi, and M. Horowitz, ITCC 2001]

21

41

Staggered Serial LDPC Decoder

1

2

3

4

5

6

7

8

A

time

1

2

3

4

6

5

7

8

1

3

B

1

3

5

7

8

2

4

6C

5

7

8

2

4

6

A

...

Increase number of variable node PEsStaggered message updating

reduced complexity of PEv

Messages stored in variable node PEsreduced memory requirement

Improved BER @ reduced iteration counts

PEC

PEVPEVPEVPEVPEVPEVPEV

MUX

[E. Yeo, et. al. Globecom2001]

42

Pipelined Architecture of LDPC Decoder

Bit to Check

MEMORY

MEMORY

MEMORY

MEMORY

1. Randomness of connectivity in bi-partite graph inhibits any kind of memory reuse.

2. Two banks of memory alternating between read and write required.

3. Total memory requirement: 72k words

Check to Bit

22

43

LDPC Codes Based on Galois Fields

[Y. Kou, et. al. ISIT 2000]

• Codes based on GF projections are low rate.• No cycles of length 4 (short loop)• Cyclic rows• e.g. (1023 x 1023) code has rate of 0.68

• Column splitting• Each column in original matrix is split into four• Non-zero entries in original column are cycled

through the 4 new columns • eg. (1023 x 4092) code has rate of 0.75• Partial loss of regularity (cyclic structure)• Complex O(N2) encoding

• Puncturing• Truncate height of PC matrix• Columns in the maximum zero runlength region

correspond to parity bit locations• Cyclic encoding using direct application of PC matrix

now possible

44

Shift Register-Based Implementation

• Staggered decoding.

• Regularity of codes based on Finite Field geometries.

...

Q4089

Q4090

Q4091

Q4092

Check-to-Bit Message Computation Block

Q4085

Q4086

Q4087

Q4088

Q0

Q1

Q2

Q3

+

+

+

+

Q4081

Q4082

Q4083

Q4084

...

+

+

+

+

Q29

Q30

Q31

Q32

[E. Yeo, et. al. Globecom2001]

23

45

LDPC Performance

Djurdjevic, CommLetters 07/03

46

LDPC Error Floors

Richardson, Allerton’03

Outline - University of California, Berkeleybwrcs.eecs.berkeley.edu/Classes/EE290C_S04/... ·...

Documents

Transcript of Outline - University of California, Berkeleybwrcs.eecs.berkeley.edu/Classes/EE290C_S04/... ·...