Outline - University of California, Berkeleybwrcs.eecs.berkeley.edu/Classes/EE290C_S04/... ·...
Transcript of Outline - University of California, Berkeleybwrcs.eecs.berkeley.edu/Classes/EE290C_S04/... ·...
1
EE290C - Spring 2004Advanced Topics in Circuit DesignHigh-Speed Electrical Interfaces
Lecture 18Components
Reed-Solomon, LDPC DecodersBorivoje NikolicMarch 18, 2004.
2
Outline
Galois fields algebraReed-Solomon CodesLow-density parity-check codes
2
3
Need a Bit of Algebra
Coding operations are usually done modulo-2For circuit designers: an XOR is modulo-2 addition
Finite fields, GF(2n)Sets of numbers generated using generator polynomialsAny operation that involves numbers from the field produces results that are also in the field
Example: x = 2x3 + x2 + 1 = 23 + 22 + 1 = 1101(x3 + x2 + 1) (x4 + x2 + 1) = x7 + x6 + x5 + x4 + x4 + x3 +x2 +x2 +1= x7 + x6 + x5 + x3 +1
4
Generator Polynomials
Generator polynomials (GP) are primitive (irreducible).E.g. x4 + 1 is not primitive - reduces to x + 1 and x3+x2+x+1Example of a GP
x2 + x + 1x3 + x + 1x3 + x2 + 1
GP describes the field of the order n-1PRBS is an example sequence generated by a GP
3
5
Galois Fields
E.g. GF(23)GP: x3 + x + 1 = 0Chose an element α (primitive root)α = x = 010 (2); α2^n-1 = α0 = 1
α1 = x = 010 (2)α2 = x2 = 100 (4)α3 = x3 = x + 1 = 011 (3)α4 = αα3= x(x + 1) = 110 (6)α5 = αα4 = x(x2 + x) = (x + 1) + x2 = 111 (7)α6 = α2α4= x2(x2 + x) = x2 + 1 = 101 (5)α7 = αα6= x(x2 + 1) = (x + 1) + x = 001 (1)
6
Operations in Galois Fields
Addition is just an XORα + α3 = 010 + 011 = 001, orα + α3 = x + (x + 1) = (x + x) + 1 = 001
Multiplicationα4α5 = α9 = α7α2 = 1α2 = α2 = 100 orα4α5 = (x2 + x)(x2 + x +1) = x4 + x3 + x2 + x3 + x2 + x = x4 + x= xx3 + x = x(x + 1) + x = x2 = 100
4
7
Reed-Solomon Codes
Invented ~1960A special case of BCH (Bose-Chaudhury-Hocquenghem) codesStarts with bits, di
And replaces with symbols, pi
Symbols are typically 8-bit (don’t have to be)
01
0=α∑
−
=
m
i
iid
01
0=α∑
−
=
m
i
iip
8
Reed-Solomon CodesRS(n, k) code over GF(2m)
2m – 1 symbolsk user symbols
Minimum distance n – k + 1Can correct up to t = (n – k)/2 errors E.g. RS(255,239) over GF(28) code is used in long-haul optical communications
16 parity bytes added to 239 user bytes (7% overhead)5.5dB coding gain @ BER = 10-12
This coding gain moves BER from 10-4 to 10-15
or from 10-5 to 10-24 (at given SNR)Can correct bursts up to 64b16-way interleaved code can correct 1024-b bursts
5
9
Reed-Solomon Encoding
Assume: E.g. GF(23), GP: x3 + x + 1 = 0, 3-bit symbolsMessage: 3, 4, 6, 0, 1, 1, and add p0 as a check symbol3, 4, 6, 0, 1, 1, p0 = α3, α2, α4, 0, α0, α0, p0
The encoded word must satisfy:
α3α6 + α2α5 + α4α4 + 0α3 + α0 α2 + α0 α1 + p0 α0 = 0α2 + α0 + α2 + 0 + α2 + α1 + p0 = 0Therefore p0 = α5
01
0=α∑
−
=
m
i
iip
10
Reed-Solomon Decoding
If no errors – the polynomial evaluates to 0
If there is an error at e.g. position α2:α3, α2, α4, 0, α0, (α0 +e), α5
Syndrome is α3α6 + α2α5 + α4α4 + 0α3 + (α0+e) α2 + α0 α1 + p0 α0
= 0 + eα2
The error can be detected, but can’t be correctedNeed to add another symbol
6
11
Reed-Solomon Decoding
New constraintfor j = 0, 1Our example (with one symbol less):j = 0: α3 + α2 + α4 + 0 + α0 + p1 + p0 = 0j = 1: α3α6 + α2α5 + α4α4 + 0α3 + α0 α2 + p1α1 + p0α0 = 0p1 + p0 = 0; p1α + p0 = 0, therefore, p1 = p0 = 1 = α0
The receiver calculates two syndromes, for j = 0 and j = 1
01
0
=∑−
=
×m
i
jiip α
∑−
=
×=1
0
m
i
jiij pS α
12
Reed-Solomon Decoding
Adding an error e = αOur message α3, α2, α4, 0, α0, (α0 +e), α5
Becomes α3, α2, α4, 0, α0, α3, α5
S0 = α - error magnitudeS1 = α2 - eαk, k is the positionα2 = S1/S0
In this example S1/S0 = α1 , k = 1The error is corrected by adding S0 to the received symbol at position k.This can be extended to correct for multiple errors
7
13
Correcting Multiple Errors
To correct for 2 symbol errors, need 4 checks: 2 positions + 2 error magnitudesDecoder evaluates 4 syndromesBut it is difficult to work out the error magnitudes and error locations from them
Simultaneous equations in GF (so called “key equations”)
Alternatives are trial-and-error solutions, known as Berlekamp-Massey algorithm and Eucledean algorithm
14
Practical RS Decoding
1. Syndrome computation
2. Key equation
Λj – error locator, Ωj – error evaluator polynomials3. Compute error locations and error values from Λj, Ωj
Error locations – Chien’s searchError values – Forney’s algorithm
∑−
=
×=1
0
m
i
jiij pS α
( )mjjjS αmodΩ⋅Λ=
8
15
Decoder Design
16
DVD RS CodeProduct RS code
Chang, JSSC 02/01
9
17
Decoding Sequence
18
Syndrome Calculation
Syndrome cell
Syndrome calculator cell
∑−
=
×=1
0
m
i
jiij pS α
Chang, JSSC 02/01
Syndrome calculation is straightforward
10
19
Solving the Key Equation
Iterative solutions using Berlekamp-Massey or Euclid’s algorithm
Both require GF multiplication and division in each iteration
Euclidean algorithm calculates the greatest common divisor among two polynomials (Sj and α2)
Solves the key equation by iterative polynomial division and multiplication
Division-free Euclidean algorithm [Shao, CICC’85]Inversionless Berlekamp-Massey algorithm [Shayan, TCom’93]
Replaces polynomial division with cross-multiplication
20
Berlekamp-Massey Computation
Song, ISSCC’02
11
21
Modified Euclidean Computation
Song, ISSCC’02
22
Modified Euclidean Algorithm
Song, JSSC 11/02
12
23
Chien Search
Chien search evaluates polynomials
Seki, CICC’01
24
Why Use FEC?
Relax component requirementsImprove distance
3dB can double the reach of DSL
Increase capacity (rate)3dB can improve the rate by 50% in QPSK
Reduce system costImprove system quality
Only total system evaluation can determine if it is worth to implement FEC in high-speed links
13
25
Example RS Decoders
Song, JSSC 11/02, 10Gb/s, 4 x 2.5Gb/s 83MHz, 0.16µm CMOS, 340mWSong, JSSC 11/02, 40Gb/s, 112MHz, 0.16µm CMOS, 360mWSeki, CICC’01 10Gb/s, 0.18µm CMOS, 350mWRS decoders are typically a part of bigger chips (10mm x 10mm) that include framing, Mux/Demux, 1-10W
26
OC-192 RS CoDec
Seki, CICC’01
14
27
Performance of RS Decoders
Azadet, JSSC 3/02
28
Parity Checks
(7, 4) code, 4 user bits and 3 parity checksd6, d5, d4, d3, p2, p1, p0
Parity checks:p0 = d3 + d5 + d6
p1 = d3 + d4 + d5
p2 = d3 + d4 + d6
The code has a distance of 3, therefore can correct 1 error
15
29
Parity Checks
Parity check equationsd6 + d5 + d3 + p0 = 0
d5 + d4 + d3 + p1 = 0d6 + d5 + d3 + p2 = 0
Parity-check matrix1 1 0 1 0 0 10 1 1 1 0 1 01 1 0 1 1 0 0
30
Low-Density Parity-Check Code
Parity codes based on sparse H matricesDiscovered in 1960, rediscovered in 1990sExceptional error correcting capability
Within 0.01dB of Shannon bound
Efficient decoding based on message passingPart of almost every new communications standard, like ADSL2+, 10GBase-T
16
31
Message Passing Analogy (Trellis)
How many people are there?
1 2 3 4 5 6 7 88 7 6 5 4 3 2 1
32
Message Passing Analogy (Tree)
9 8 7
9
8
9
1 2 3 4
3
2
1
2
1
Output @ each node = (Marginalized sum of inputs + 1 )
8
6
7
17
33
LDPC: Overview
X1 X2 X3 X4 X5 X6 X7 XN
…
…
Parity Checks
Bits
• LDPC representation by either parity check matrix or the bi-partite graph.
• Message passing: Bit-to-Check / Check-to-Bit
• Total Number of Edges: 18432
1 0 0 0 …. 0 0 1 0 0 … 1 0 1 1 0 … 0 1 0 0 1 … 0 ….. ….. ….. 0 0 1 0 … 1
Wc = 512
Wr = 4608
Column Weight = 4
Row Wt. = 36
[GALLAGER R. G., IRE Trans. Info. Theory, Vol. 8(1962) p. 21]
34
Parallel LDPC Decoder Architecture
1
2
PEc1
PEv1 PEv2…
.
.
.
PEv3 PEv4
Interconnect Fabric
PEvN
PEc2 PEcM…
Fully parallel structuree.g. satellite receiver:64,000 variable node processing elements, PEv
Variable number check node processing elements, PEc
High throughput; low powerRouting complexity [A. Blanksby and C. J. Howland, JSSC 2002]
18
35
Parallel LDPC Decoder Architecture1Gb/s, 690mW, 1024-bit blocks, rate -1/2Routing density 50%, 50mm2 chip
Blanksby, JSSC 3/02
36
Bit-to-Check Message Computation
C4
Ri,1 Qi,4
Bi
C1
C2
C3
Ri,3 Ri,2
Ri,4
Qi,4 = Ri,1 + Ri,2 + Ri,3
Prior Prob
D
D
D
D
Ri,j1
Ri,j2
Ri,j3
Ri,j4
Qi,j1
Qi,j2
Qi,j3
Qi,j4
Parallel Tree-Adder Structure
19
37
Check-to-Bit Computations
Bi …
Cj
Q36,j Q3,j Q2,j
R1,j
B2 B3 B36
Q1,j
R1,j = f(Q2,j , Q3,j ,. . . , Q36,j )
( ) ii
i
NQMinQ
QQQf
•≈ ∏ )sgn(
,...,, 21
Ri,j Qi,j
36-Cycle Delay
LUT
LUT
Φ Φ Φ/36
Φ Φ
Recursive-Serial Adder Structure
38
LDPC Code Performance
BER performance of LDPC depends onGraph girthCode expansion + block sizeHamming distance
“Random” parity-check matrices usually achieve good performance
The interconnect network is random, tooA 1024-bit LDPC decoder in 0.15µm occupies 7mm x 7mm with 50% density [Blanksby, Howland, JSSC’02]
Structured codes achieve good performance with structured interconnect
Permutation codes, Ramanujan and Cayley graphs
20
39
Structured LDPCs
Bit node groups
Check node groups
Construction based on Ramanujan graphs allows for hierarchical decomposition and good performance
40
Serial LDPC Decoder Architecture
1
2
3
4
5
6
7
8
A
B
C
time
...
A
B
C
1
2
3
4
5
6
7
8
Latency dependent on total number of nodesMessages are stored in SRAM
Large memory requirement
Natural structure for microprocessors, DSPs, etc.Parallelizing computation with limited PEs
PECPEV SRAM
[G. Al-Rawi, J. Cioffi, and M. Horowitz, ITCC 2001]
21
41
Staggered Serial LDPC Decoder
1
2
3
4
5
6
7
8
A
time
1
2
3
4
6
5
7
8
1
3
B
1
3
5
7
8
2
4
6C
5
7
8
2
4
6
A
...
Increase number of variable node PEsStaggered message updating
reduced complexity of PEv
Messages stored in variable node PEsreduced memory requirement
Improved BER @ reduced iteration counts
PEC
PEVPEVPEVPEVPEVPEVPEV
MUX
[E. Yeo, et. al. Globecom2001]
42
Pipelined Architecture of LDPC Decoder
Bit to Check
MEMORY
MEMORY
MEMORY
MEMORY
1. Randomness of connectivity in bi-partite graph inhibits any kind of memory reuse.
2. Two banks of memory alternating between read and write required.
3. Total memory requirement: 72k words
Check to Bit
22
43
LDPC Codes Based on Galois Fields
[Y. Kou, et. al. ISIT 2000]
• Codes based on GF projections are low rate.• No cycles of length 4 (short loop)• Cyclic rows• e.g. (1023 x 1023) code has rate of 0.68
• Column splitting• Each column in original matrix is split into four• Non-zero entries in original column are cycled
through the 4 new columns • eg. (1023 x 4092) code has rate of 0.75• Partial loss of regularity (cyclic structure)• Complex O(N2) encoding
• Puncturing• Truncate height of PC matrix• Columns in the maximum zero runlength region
correspond to parity bit locations• Cyclic encoding using direct application of PC matrix
now possible
44
Shift Register-Based Implementation
• Staggered decoding.
• Regularity of codes based on Finite Field geometries.
...
Q4089
Q4090
Q4091
Q4092
Check-to-Bit Message Computation Block
Q4085
Q4086
Q4087
Q4088
Q0
Q1
Q2
Q3
+
+
+
+
Q4081
Q4082
Q4083
Q4084
...
+
+
+
+
Q29
Q30
Q31
Q32
[E. Yeo, et. al. Globecom2001]
23
45
LDPC Performance
Djurdjevic, CommLetters 07/03
46
LDPC Error Floors
Richardson, Allerton’03