BCJR Material Text

7/30/2019 BCJR Material Text

1/81

REDUCED-COMPLEXITY ALGORITHMS FOR DECODING AND

EQUALIZATION

A Dissertation

Submitted to the Graduate School

of the University of Notre Dame

in Partial Fulfillment of the Requirements

for the Degree of

Doctor of Philosophy

by

Marcin Sikora, M.S.

Daniel J. Costello, Jr., Director

Graduate Program in Electrical Engineering

Notre Dame, Indiana

May 2008


2/81

This document is in the public domain.


3/81

REDUCED-COMPLEXITY ALGORITHMS FOR DECODING AND

EQUALIZATION

Abstract

by

Marcin Sikora

Finite state machines (FSMs) and detection problems involving them are fre-

quently encountered in digital communication systems for noisy channels. One

type of FSM arises naturally in transmission over band-limited frequency-selective

channels, when bits are modulated onto complex symbols using memoryless map-

per and passed through a finite impulse response (FIR) filter. Another type of

FSMs, mapping sequences of information bits into longer sequences of coded bits,

are the convolutional codes. The detection problem for FSMs, termed decoding inthe context of convolutional codes and equalization for frequency-selective chan-

nels, involve either finding the most likely input sequence given noisy observations

of the output sequence (hard-output decoding), or determining a posteriori prob-

abilty of individual information bits (soft-output decoding). These problems are

commonly solved either running a search algorithm on the tree representation of

all FSM sequences or by means of dynamic programming on the trellis represen-

tation of the FSM.

This work presents novel approaches to decoding and equalization based on

tree search. For decoding of convolutional codes, two novel supercode heuristics

are proposed to guide the search procedure, reducing the average number of visited


4/81

Marcin Sikora

incorrect nodes. For soft-output decoding and equalization, a new approach to

the generation of soft output within the M-algorithm-based search is presented.

Both techniques, when applied simultaneously, yield a particularly efficient soft

output decoder for large-memory convolutional codes. Finally, a short block code

is presented, which repeated and concatenated with strong outer convolutional

code yields an iteratively-decodable coding scheme with excellent convergence

and minimum distance properties. With the help of the proposed soft output

decoder for the outer convolutional code, this concatenation has also low decoding

complexity.


5/81

CONTENTS

FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

CHAPTER 1: INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . 11.1 Problem background . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Contribution and outline . . . . . . . . . . . . . . . . . . . . . . . 4

CHAPTER 2: SUPERCODE HEURISTICS FOR TREE SEARCH DE-CODING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Tree search decoding . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 The algorithms A and A* . . . . . . . . . . . . . . . . . . 82.1.3 Supercode heuristics . . . . . . . . . . . . . . . . . . . . . 10

2.2 Construction and trellis representation of supercodes . . . . . . . 122.3 Supercode A*-type heuristic for ML decoding . . . . . . . . . . . 142.4 Supercode A-type heuristic for sub-ML decoding . . . . . . . . . . 152.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

CHAPTER 3: SOFT-OUTPUT EQUALIZATION WITH THE M-BCJRALGORITHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 Communication system . . . . . . . . . . . . . . . . . . . . . . . . 243.3 SISO equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3.1 The BCJR algorithm . . . . . . . . . . . . . . . . . . . . . 273.3.2 The RS-BCJR algorithm . . . . . . . . . . . . . . . . . . . 283.3.3 The M-BCJR algorithm . . . . . . . . . . . . . . . . . . . 30

ii


6/81

3.4 The M-BCJR algorithm . . . . . . . . . . . . . . . . . . . . . . . 313.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

CHAPTER 4: SERIAL CONCATENATIONS WITH SIMPLE BLOCK IN-NER CODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2 Soft-output decoding of the GSPC code . . . . . . . . . . . . . . . 414.3 Bounds on ML performance of SCCs with an inner GSPC code . . 43

4.3.1 An idealized interleaver . . . . . . . . . . . . . . . . . . . . 454.3.2 A uniform interleaver . . . . . . . . . . . . . . . . . . . . . 464.3.3 Comparison with simulation results . . . . . . . . . . . . . 47

4.4 EXIT chart analysis for GSPC codes . . . . . . . . . . . . . . . . 474.5 Design examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

CHAPTER 5: SOFT-OUTPUT DECODING OF CONVOLUTIONAL CODESWITH THE M-BCJR ALGORITHM . . . . . . . . . . . . . . . . . . 535.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.2 The communication system . . . . . . . . . . . . . . . . . . . . . 565.3 The M*-BCJR algorithm . . . . . . . . . . . . . . . . . . . . . . . 58

5.3.1 Algorithm description . . . . . . . . . . . . . . . . . . . . 585.3.2 Impact of survivor selection on the performance of M*-BCJR 59

5.4 Survivor selection with supercode heuristic . . . . . . . . . . . . . 625.4.1 Supercode heuristic . . . . . . . . . . . . . . . . . . . . . . 625.4.2 Construction and trellis representation of supercodes . . . 62

5.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

iii


7/81

FIGURES

2.1 Average number of path extensions per coded bit performed by thestack algorithm with the ML supercode heuristic hS. . . . . . . . 20

2.2 Average number of path extensions per coded bit performed by thestack algorithm with the sub-ML supercode heuristic hS. . . . . . 20

3.1 Communication system with turbo equalization. . . . . . . . . . . 253.2 Part of the system to be soft-inverted by the SISO equalizer. . . 25

3.3 Trellis section a) before and b) after merging an excess state si intoa surviving state si. . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Bit error rate of M-BCJR and RS-BCJR for a) scenario 1 (BPSK)and b) scenario 2 (16QAM). . . . . . . . . . . . . . . . . . . . . . 35

3.5 Number of states vs. Eb/No to reach the reference Pe for a) scenario1 (BPSK, Pe = 10

4) and b) scenario 2 (16QAM, Pe = 103). . . 36

4.1 Serially concatenated coding with an inner block code. . . . . . . 40

4.2 Generalized single parity check encoder. . . . . . . . . . . . . . . 40

4.3 Trellis for soft-output decoding of the generalized parity check code. 444.4 Comparison of simulation results and BER bounds for the 2-state,

rate 1/2 outer convolutional code. . . . . . . . . . . . . . . . . . . 48

4.5 Dependence between the GSPC code parameters and the EXITcurve shape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.6 EXIT charts for a) SCC 1 with 16-state outer code, b) SCC 2 with256-state outer code. . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.7 BER performance and uniform interleaver limit for a) SCC 1 with

16-state outer code, b) SCC 2 with 256-state outer code. . . . . . 52

5.1 Serial concatenation of strong outer CC and inner GSPC. . . . . . 57

5.2 Performance of an iteratively decoded SCC with (2,1,8) outer CCdecoded using standard M*-BCJR after 8 iterations. . . . . . . . . 61

iv


8/81

5.3 Performance of the M*-BCJR algorithm aided by a genie in survivorselection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4 Performance of the M*-BCJR algorithm with the supercode heuris-tic for a) M = 16 and b) M = 32. . . . . . . . . . . . . . . . . . . 66

v


9/81

TABLES

2.1 SUPERCODE PARAMETERS . . . . . . . . . . . . . . . . . . . 19

3.1 SIMULATED TURBO-EQUALIZATION SCENARIOS . . . . . . 34

vi


10/81

ACKNOWLEDGMENTS

First I would like to sincerely thank my parents and family back home in

Poland. Without their love, inspiration, and constant support for my actions and

endeavors, my Ph.D. studies at Notre Dame and this thesis would never come to

be.

I would like to express my gratitude to my advisor, Professor Daniel Costello,

for his guidance and support throughout my studies. It was his encouragement,

insight, and a clever use of deadlines that each time led me from the desert of

failed ideas and self doubt to the promised land of novelty and excitement. He

has made a contribution to me both as a researcher and as a person and words

cannot express the thanks sufficiently.

I am thankful to Professors Nicholas Laneman, Martin Haenggi, Thomas Fuja,

and Oliver Collins for support and helpful discussion throughout my studies. Fi-

nally I would like to thank to my friends at Notre Dame, especially Ali Pusane,

Christian Koller, and Faruck Morcos, for making my stay in South Bend an un-

forgettable experience.

vii


11/81

CHAPTER 1

INTRODUCTION

1.1 Problem background

The amount of information that our world digitally gathers, transfers, stores,

and processes each day is rapidly growing. As we push the physical limits of fiber

optic cables, radio spectrum, magnetic disks, and silicon memory, these storage

and communication media become increasingly unreliable, corrupting the informa-

tion with random noise and complex distortions. The goal of the communication

theory is the design of communication systems that allow reliable recovery of the

original information, represented as a sequence of bits, with complexity, latency,

and chance of error kept as small as possible.

Within such systems, the transmitter is responsible for performing the map-

ping from the set of all possible information sequences into a set of bits, symbols,

or waveforms suitable for the transmission over the channel, while the receiver

establishes the probable information sequence based on the observed channel out-

put. The overall reliability of communication, measured in terms of average rate

of erroneously recovered bits (bit error rate, BER), depends on both these ele-ments. If the mapping implemented by the transmitter does not separate distinct

sequences far enough in the signal space, the channel is likely to make the received

signal appear more similar to an incorrect signal than the actually transmitted one.

1


12/81

Additional errors arise if the receiver algorithm does not perform the maximum

likelihood (ML) detection, but rather its approximation.

It is not difficult to find good transmitter mappings which, combined with an

ML receiver, provide very reliable transmission. In fact, in his landmark paper

[29], Shannon demonstrated that mappings generated randomly have a very high

chance of being very reliable. It is much more difficult to find ones with transmit-

ters and receivers of reasonable complexity. Since, in general, the computational

complexity of the receiver is larger than that of the transmitter, it is the availability

of efficient detection algorithms that ultimately limits the achievable performance.

This fact is the main motivation for development of new reduced-complexity detec-

tion algorithms which, at the same computational cost, can handle more complex

detection tasks, leading to an improved BER.

The main results presented in this thesis are novel reduced-complexity tech-

niques for solving the detection problems involving finite state machines (FSMs)

with noise-corrupted outputs. The two main cases when such FSMs arise in

communication systems are convolutional codes (CC) [7, 19] and inter-symbol in-

terference (ISI) channels [26]. The detection problems for CCs will be referred to

as decoding while those for ISI channels as equalization. While quite different in

origin, both these FSMs can be decoded and equalized using very similar algo-

rithms. This is because the sets of all possible output sequences of a FSM can be

combinatorically represented by a tree or trellis.

There are two important detection problems associated with FSMs are max-

imum likelihood sequence detection (MLSD) and a posteriori probablity (APP)

computation. The former involves finding the most likely input sequence (or out-

put sequence) given the sequence of channel outputs and is often referred to as

2


13/81

hard-output detection. Hard-output decoding is particularly desirable if the re-

sulting sequence of information bits is the immediate input to the higher layers of

the communication systems, as MLSD guarantees minimum probability of block

error. However, MLSD does not provide any reliability information about individ-

ual bits in the sequence, a disadvantage in concatenated system where the detector

output is fed as input to an outer decoder. In such cases the computation of APP

for each individual bit in the information sequence, termed soft-output detection

[13], is more advantageous. Soft-output decoders are the central element of turbo

decoders [4], in which parallel or serial concatenations of interleaved codes are

decoded by iteratively performing soft-output decoding of the component codes.

Similarly, soft-output equalizers can be used to provide reliability information to

the decoders or for turbo-equalization [18].

1.2 Previous work

The classical solutions to the FSM detection problems can be divided into

trellis- and tree-based algorithms. The Viterbi [39] and the Bahl-Cocke-Jelinek-Raviv (BCJR) [2] algorithms use the trellis representation of the set of all pos-

sible input-output FSM sequence mappings and dynamic programming to solve

the MLSE and APP problems, respectively. They are guaranteed to obtain their

solutions in a fixed amount of time independently of the noise level, but their

complexity scales linearly with the size of the trellis, which is exponential in the

memory length of the FSM. The Viterbi and BCJR algorithms have been sucess-

fully used with CCs and ISIs with trellis sizes up to couple of hundred, or in rare

cases couple of thousands states per time unit, but their application to trellises

with more than a hundred states is rarely justifiable in terms of complexity.

3


14/81

The tree-search algorithms are well-suited for solving the MLSE problem and

an entire class of sequential algorithms [1] has been developed. The most notable

examples are the Fano algorithm [8], the stack algorithms [17, 41], and the M-

algorithm [1]. All these techniques attempt to find the ML sequence by searching

through the tree of partial sequences, only following the most promising paths.

As a result the decoding time is variable and can be much shorter than that of

the Viterbi algorithm at high signal-to-noise ratios (SNRs), even for FSMs with

large memory. At low SNRs, however, the decoding time of sequential algorithms

rapidly increases, limiting their utility.

Just as the sequential decoding can be perceive as a reduced-complexity al-

ternative to Viterbi decoding, there have been attempts to develop less complex

algorithms capable of generating soft outputs. The M-BCJR [9] and the LISS algo-

rithms [12] use partial paths generated by a sequential procedure to generate soft

outputs. The Soft-Output Viterbi Algorithm (SOVA) [11] augments the Viterbi

algorithm to obtain bit reliability information. Reduced-State BCJR (RS-BCJR)

[5], applicable only to the equalization problem, executes the BCJR algorithm on

a trellis with shortened FSM memory. The BEAST algorithm [21] generates soft

output from the list of several most likely input sequences. The price for the com-

plexity reduction provided by these techniques is lower quality of the soft outputs,

which causes higher overall BER and negatively impacts the convergence of turbo

decoding methods.

1.3 Contribution and outline

This work describes novel hard-output and soft-output detection techniques for

FSMs which operate at lower computational complexities than previous methods.

4


15/81

The first of these methods, presented in Chapter 2, extends the classical sequen-

tial decoding to encompass heuristic tree search. The use of appropriate path cost

heuristics can lead to faster tree search at a cost of an additional fixed computa-

tional burden. A particularly effective mechanism for obtaining such heuristics,

based on the concept of a supercode, leads to overall complexity reduction of

hard-output decoding of convolutional codes at low SNR.

Chapter 3 presents a soft-output algorithm called M-BCJR, partially based

on the sequential M-algorithm. The M-BCJR algorithm dynamically builds a

reduced-complexity trellis with low number of states and executes the BCJR al-

gorithm on it. The main novelty of this method is the trellis construction process,

that utilizes specially designed state absorbing operation and state distance met-

ric. The algorithm is particularly well suited for turbo equalization, offering better

performance-complexity tradeoff than existing methods. The limitations of M-

BCJR as a soft-output decoder for CCs with larger memory are explored and

subsequently remedied in Chapter 5. It is shown that the primary preformance

limitation is the inaccurate survivor state selection during the construction of

the reduced trellis. It is further shown that the supercode heuristic introduced

in Chapter 2 can be sucessfully used to improve the selection accuracy, hence

improving the quality of soft outputs.

The results pertaining to soft-output decoding of large memory CCs are pre-

ceded by Chapter 4, which discusses applicability of such codes as components of

turbo codes. Traditionally, turbo codes are constructed from convolutional codes

with short memory, allowing efficient BCJR decoding and generally leading to

good convergence of the iterative decoding. However, when shorter block lengths

are required, such turbo codes can have relatively poor error floor performance

5


16/81

due to low minimum distance. Chapter 4 presents a serial concatenation of an

outer strong CC and an inner novel block code, which assures good minimum dis-

tance and excellent convergence. Furthermore, this scheme greatly benefits from

the application of the M-BCJR algorithm to decoding of the outer code.

6


17/81

CHAPTER 2

SUPERCODE HEURISTICS FOR TREE SEARCH DECODING

2.1 Introduction

2.1.1 Tree search decoding

After their invention by Elias [7], the first practical approach to decoding

convolutional codes (CC) was a tree search technique, due to Wozencraft [40].

Wozencrafts decoder attempts to follow the most promising paths through the

tree of partial codewords until a complete codeword is found, and it achieves

this by performing two primary tasks: calculating path metrics for visited partial

codewords and generating new paths by extending old paths with high metrics.

This tree search principle, acompanied by a requirement that the metric for paths

of length L (in coded bits) only depends on the first L received channel symbols,

is refered to as sequential decoding. Despite many improvements to Wozencrafts

original algorithm proposed in the literature, both to the path metric computation

[8] and the path extension procedure [1, 8, 17, 41], the ultimate limit to sequential

decoding is the computational cutoff rate, a rate above which the average number

of visited paths (and hence computations) is unbounded. This limitation precludes

sequential decoding from practical operation at rates close to capacity. However,

as we demonstrate in this paper, this limitation need not apply to tree search

decoding in general when path metrics utilize the entire received sequence.

7


18/81

2.1.2 The algorithms A and A*

One of the best understood tree search decoding algorithms is the Zigangirov-

Jelinek stack algorithm [17, 41], which is a special case of the best-first algorithm

A for informed tree search studied in the computer science literature [16, 24, 25,

28]. When the A algorithm is applied to a shortest (or longest) path problem, the

above references offer a standard approach to designing path metrics. In terms of

the maximum likelihood (ML) decoding problem, defined as

xML = arg maxxC

N

i=1log2 P(yi|xi), (2.1)

the metric for a partial codeword x1,L should have the general form

f(x1,L) = g (x1,L) + h (x1,L) , (2.2)

where the first term measures the path cost accumulated so far,

g (x1,L) =

Li=1

log2 P(yi|xi), (2.3)

and the second term is an heuristic estimate for the remaining cost of reaching

the end of the tree if the best extension of x1,L has been followed, i.e.,

h (x1,L) hideal (x1,L) , (2.4)

hideal (x1,L) = maxxC(x1,L)

Ni=L+1

log2 P(yi|xi), (2.5)

where C(x1,L) is the set of codewords in C beginning with x1,L. Depending onthe accuracy and properties of the heuristic function h (x1,L), algorithms with

8


19/81

different computational complexities and probabilities of missing the ML path can

be obtained. In particular, if the exact value of hideal is used as a heuristic, the

algorithm A is guaranteed to find the ML codeword in N steps, i.e., it will never

diverge from the ML path. A practical heuristic, of course, should trade off some

of the accuracy for ease of computation, such that the joint computational effort

of obtaining the path metrics and searching the tree is as small as possible. The

authors of [25, 28] suggest that the general approach to obtaining good heuristics

is replacing the original problem defining hideal by relaxed versions that are simpler

to solve. The supercode heuristic introduced in this paper adheres to this idea.

A much celebrated variant of the algorithm A with path metric (2.2), called

A*, is obtained when the heuristic function is not just an approximation, but an

upper bound, to (2.5), i.e.,

h (x1,L) hideal (x1,L) . (2.6)

The above condition guarantees that A* always finds the ML solution to the

decoding problem. However, the optimality guarantee offered by A* is very costly

in terms of the number of paths visited by the search algorithm and heuristics

of similar complexity that aim at satisfying (2.4) rather than (2.6) conclude the

search much faster. The ML guarantee is also not essential for practical decoding,

as long as the decoding errors caused by missing the ML solution are less frequent

than the errors due to the actually transmitted codeword not being ML.

Our reason for mentioning the A* algorithm is the fact that, as we demon-strate in subsequent sections, our supercode approach naturally leads to heuristics

satisfying (2.6). Although these heuristics considerably reduce the number of com-

putations compared to previous A* decoding metrics proposed in the literature

9


20/81

[15], they still compare unfavorably to the standard Fano metric [8], which does

not satisfy (2.6). Similar observations have been made by other authors, who ap-

plied the concept of the A and A* algorithms to the sequential decoding problem

[14, 34]. The main result of this paper is a supercode heuristic that generalizes the

Fano metric, sacrificing (2.6) for a considerable reduction in the number of visited

paths and offering practical decoding at rates inaccessible to previously proposed

sequential algorithms.

2.1.3 Supercode heuristics

The heuristics proposed in this paper involve solving the maximization (or

a problem similar to maximization) of (2.5) for a supercode S(x1,L) of the codeC(x1,L) (i.e., S(x1,L) C(x1,L)), rather than for the code C itself. Before we discusswhen and why such a problem would be simpler, or even feasible, we first present

a practical way to implement the exact hideal for a terminated convolutional code.

Examination of (2.5) reveals that hideal(x1,L) is only a function of the final state

sL of the encoder that outputs x1,L, and thus the total number of unique values of

hideal(sL) is equal to the total number of states in the trellis representation of C.Therefore, instead of computing hideal(x1,L) each time the tree search procedure

visits a path x1,L, values of hideal(sL) can be precomputed for all states before

the actual tree search begins. This precomputation can be performed by first

setting the metric of the final trellis state hideal(sN+1) to zero and then recursively

computing the values for earlier states through add-compare-select operations,

i.e.,

hideal (sL) = max(sL,xL,sL+1)

(log2 P(yL|xL) + hideal (sL+1)) , (2.7)

10


21/81

where the maximization is performed over all trellis branches starting at sL. This

procedure can be immediately recognized as the Viterbi algorithm run backwards

(that stores state metrics rather than survivor information, since it solves the max

rather than arg max problem). If we additionally recall that algorithm A with an

ideal metric never diverges from the optimal path, acting just like the backtracking

procedure of the Viterbi algorithm ran in the forward direction, we can conclude

that algorithm A with an ideal heuristic precomputed using (2.7) is equivalent to

the (inverted) Viterbi algorithm. We will subsequently refer to the ideal heuristic

precomputed in such way as the Viterbi heuristic. Moreover, we will call such a

precomputation task a backward step, and the subsequent tree search a forward

step.

The actual practicality of the Viterbi heuristic is limited by the number of

states in the code trellis, which for convolutional codes grows exponentially with

the constraint length. Our approach, detailed in subsequent sections, relies on

finding supercodes that have a trellis representation with a lower number of states

than the original code. We present one such technique for codes of rate R

1/2,

based on deleting rows from the parity check matrix of the original code, in Section

2. In Section 3 we demonstrate how solving the likelihood maximization problem

over a supercode naturally leads to an upper bound on hideal, yielding an A*-type

heuristic. In Section 4 we further apply the concept of a supercode to generalize the

Fano metric and obtain a robust non-A* heuristic. We illustrate the performance

of the proposed techniques using a memory 8, rate 1/2 convolutional code as an

example in Section 5 and draw some concusions in Section 6.

11


22/81

2.2 Construction and trellis representation of supercodes

Consider a rate k/n non-recursive convolutional code with total encoder mem-

ory k. If we terminate this code after K = k information bits, we will obtain alinear block code with block length N = n( + ). The set of all codewords can

then be defined either using the generator matrix G, i.e.,

C = x GF(2)N|x = uG for some u GF(2)K , (2.8)or using the parity check matrix H, i.e.,

C = x GF(2)N|HxT = 0 . (2.9)The trellis representation of C can be obtained from either of these definitions,although typically the generator matrix is used, since the resulting trellises ad-

ditionally store the encoder mapping from u to x. However, we found that for

the purpose of obtaining and representing supercodes, the parity check matrix is

more useful.

Ifhj is the j-th row of the parity check matrix H then the j-th parity check

equation is hjxT = 0. We will say that the parity check hj is active at time L if

both hj1,L and hjL+1,N contain nonzero entries, and we use J(L) to denote the set

of such parity checks. For a given codeword x we can define a syndrome at time

L as a |J(L)|-tuple

sL =h

j

1,LxT

1,LjJ(L) . (2.10)

The possible values that can be taken by the syndrome sL correspond exactly to

the states at time L in the parity check trellis. For the terminated convolutional

12


23/81

code considered here, the number of parity checks active at times L = n, 2n, 3n, ...

is at most (n k) and the number of parity check trellis states does not exceed2(nk). It is worth pointing out that, since the analogous trellis based on (2.8)

has 2k states, the parity check trellis is less efficient at representing codes with

rates R < 1/2.

Let us now suppose that a fraction of the parity checks ofH have been om-

mited. IfP is the set of parity check rows that have not been deleted, then theresulting new linear block code has the form

SP = x GF(2)N|hjxT = 0 for all j P . (2.11)Clearly SP C. Moreover, the parity check trellis ofSP has fewer states than theoriginal trellis, since only |P J(L)| parity checks are active at time L. In themost extreme case ofP= , S = GF(2)N and the parity check trellis has onlyone state at each time index L.

In the case of convolutional codes , it is desirable that the number of trellis

states of the supercode does not vary with time. If we aim at constructing a

supercode that has at most 2M states at times L = n, 2n, 3n, ..., the set Pcannotinclude more than M out of every consecutive set of (n k) parities, i.e.,

L , . . . , L + (n k) 1 P Mfor all L = 1, . . . , (n k). (2.12)

The easiest way to construct parity check sets with this property is to make them

repetitive with period (n k). In particular, we can initialize Pto an arbitraryM-element subset of

1, . . . , (nk) and then keep adding elements j + (nk)

13


24/81

whenever j is already in the set. Since in this construction the initial M-element

subset entirely defines P, it is convenient to represent Pby a binary (nk)-tuplep, with pj = 1 ifj Pand pj = 0 otherwise.

2.3 Supercode A*-type heuristic for ML decoding

The concept of supercode presented in the previous section naturally leads to

the following heuristic for tree search decoding:

hS (x1,L) = maxxS(x1,L)

N

i=L+1log2 P(yi|xi). (2.13)

By comparing (2.13) to (2.5), it is immediately clear that hS satisfies the A*

condition (2.6), ensuring ML decoding. Similar to the implementation of hideal

discussed in the introduction, the decoder will first precompute all unique values

of hS by performing a Viterbi-like backward pass on the trellis representing S.During the forward pass, the tree search procedure must be able to easily map

from the current path x1,L to an appropriate state in the trellis forS

to facilitate

the metric lookup operation. For supercodes SP obtained by the parity checkdeletion described in the previous section, this task is accomplished by

x1,L sP,L =hj1,Lx

T1,L

jJ(L)P

, (2.14)

which need not be performed anew for each new path, but can be recursively

updated during path extension.

14


25/81

In the extreme case ofS = GF(2)N, the heuristic (2.13) simplifies to

hS (x1,L) =N

i=L+1

maxxi{0,1}

log2 P(yi|xi). (2.15)

If we consider further the entire path metric fS(x1,L) obtained from (2.2) and

(2.15) and subtract a term hS () independent ofx1,L, we obtain the equivalent

metric

f0 (x1,L) =

Li=1

log2 P(yi|xi) max

xi{0,1}log2 P(yi|xi)

. (2.16)

In this form, the overall path metric does not depend on the received symbols

beyond position L, and thus tree search decoding utilizing this metric can be

regarded as a sequential decoder. Metric (2.16) is in fact equivalent to the one used

in the Maximum Likelihood Sequential Decoding Algorithm (MLSDA) presented

in [15]. At the other extreme, ifS= C is used, the metric (2.13) yields the Viterbialgorithm, as described in the introduction.

2.4 Supercode A-type heuristic for sub-ML decoding

Despite guaranteeing ML decoding and a faster tree search compared to the

MLSDA, the A* heuristic presented in the previous section requires many more

node extensions than the simple, non-A* Fano metric, unless the supercode used

is only minimally simpler than the original code. This observation indicates that,

at the price of a very small degradation in bit error rate (BER) performance, a

significant computational savings can be achieved by following rule (2.4) rather

than (2.6). By using the Fano metric as a starting point for our derivation, we

can obtain an attractive sub-ML supercode heuristic with the same complexity as

the A* variant, but leading to a much faster tree search.

15


26/81

The Fano metric, when used in sequential decoders, usually appears in the

form

fFano (x1,L) =

L

i=1log2

P(yi|xi)P(yi)

R . (2.17)By adding terms independent ofx1,L and writing out P(yi) using the total prob-

ability rule, we obtain the equivalent metric

fFano (x1,L) =Li=1

log2 P(yi|xi)

+N

i=L+1

log2

xi{0,1}

P(yi|xi)2

+ R

, (2.18)

which conforms to the general form (2.2) of the algorithm A metric. The heuristic

part of (2.18) can be further rewritten as

hFano (x1,L) = log2

xS(x1,L)

Ni=L+1 P(yi|xi)

|S(x1,L)|

+ log2

|C(x1,L)

|, (2.19)

which reveals an embedded averaging operation over the set of all possible binary

sequences starting with x1,L. The presence of such averaging is not surprising,

since the Fano metric was derived for random codes. In our attempt to generalize

the Fano metric, we now propose to change the domain of this averaging from Sto one of the supercodes defined in section 2. We might interpret this change as

viewing the code C as a random code with codewords drawn from Srather thanfrom S. Performing such a substitution in (2.19) and additionally splitting the

16


27/81

terms involving the set cardinalities, we obtain

h1S (x1,L) = max

xS(x1,L)

N

i=L+1

log2 P(yi|xi) RS + R, (2.20)where RS is the rate of the supercode Sand the max operation is defined as

maxiI

bi log2iI

2bi . (2.21)

Just as in the case of the supercode heuristic presented in Section 3, h1S can only

assume as many unique values as there are states in the trellis of

S. Futhermore,

it can be precomputed recursively by

h1S (sL) =

max(sL,xL,sL+1)

log2 P(yL|xL) RS + R + h1S(sL+1)

, (2.22)

which makes this task identical to the backward step of the BCJR algorithm

(except for the bias term RS + R).The heuristic h1S can be further fine-tuned to account for the fact that when

we use it with S = C, it does not reduce to the Viterbi metric. The differencelies in the summarizing operation, which for the Viterbi metric is max and for

h1S is max. To allow for a smooth transition between these two operations, we

introduce

hS (x1,L) = max

xS(x1,L)

N

i=L+1

log2 P(yi|xi) RS + R, (2.23)

17


28/81

where the generalized max is defined as

maxiI

bi log2iI 2bi/. (2.24)

Just like max and max, the max operation is commutative and associative,

and the multiargument version can be obtained by successive application of the

two-argument version, given as

max(b1, b2) = log2

2b1/ + 2b2/

= max(b1, b2) + log2 1 + 2|b1b2|/ ,which is used in the backward pass each time several branches leave the same

state. By varying the parameter between 0 and 1, max becomes more similar

to one or the other operation. Unfortunately, other than using = 1 for S= Sand = 0 for S= C, we have so far been unable to devise an analytical methodof choosing . We have observed, however, that the parameter can be used to

control the tradeoff between the speed of the tree search and the BER. In the

following section we selected s by trial and error to achieve a desired BER.

2.5 Simulation results

We have examined the performance of the proposed heuristics for decoding

a (2,1,8) convolutional code G = (457, 755)o terminated after 2048 information

bits and used for transmission over the binary-input Additive White Gaussian

Noise channel. The supercodes, defined by their parity check deletion patterns

p introduced in Section 2, which were used in conjunction with the hS and hS

heuristics, are listed in Table 2.1.

18


29/81

TABLE 2.1

SUPERCODE PARAMETERS

Deletion pattern p Rate RS Trellis states Correction

(1, 0, 0, 0, 1, 0, 0, 0) 14/16 4 0.955

(1, 0, 0, 1, 0, 0, 1, 0) 13/16 8 0.945

(1, 0, 1, 0, 1, 0, 1, 0) 12/16 16 0.94

(1, 1, 0, 1, 1, 0, 1, 0) 11/16 32 0.9

(1, 1, 1, 0, 1, 1, 1, 0) 10/16 64 0.75

(1, 1, 1, 1, 1, 1, 1, 0) 9/16 128 0.0

The number of node extensions performed by the stack algorithm with the hS

heuristic for a range of energies per information bit over noise spectral densities

Eb/N0 is presented in Fig. 2.1. The figure demonstrates the tradeoff between the

complexity of the supercode (measured in the number of trellis states per trellis

segment) and the complexity of the tree search. Although using more and more

complex supercodes leads to significant improvements over MLSDA, it is clear

that the ML decoding guarantee (given that decoding actually finishes) severely

penalizes the A* approach compared to the Fano metric.

Alternatively, when the hS heuristic is employed, significant improvement com-

pared to the Fano metric can be achieved. This fact is illustrated in Fig. 2.2. The

particular values of the parameter used for each supercode are included in Table

1. Since we determined that lower values of generally lead to lower numbers of

computations but higher BERs, we selected the values to be simulated as the low-

est s for which the BER performance loss compared to ML decoding is limited

to 0.1 dB.

19


30/81

!

"

#

$

%

&

#

'

(

)

0

#

$

1

2

3

1

)

4

(

5

%

5

1

'

7

4

#

$

3

1

8

#

8

0

5

9

@

A

@

B

C D E F G

9

@

A

@

B

9

9

@

A

@

B

9

9

@

A

@

B

9

9

@

A

@

B

9

9

@

A

@

B

9

9

@

A

@

B

9

9

@

A

@

B

9

H

@

B I

P

Q

A R S B

@

I

T

Figure 2.1. Average number of path extensions per coded bit performedby the stack algorithm with the ML supercode heuristic hS.

U U

W X X W Y Y W

U

W

X U

X W

Y U

Y W

U

a

b

c d

e f

g

h i

p q

r

s

t

u

v

w

t

x

y

t

u

y

v

x

t

u

t

X

f

h

q

X

` Y

X Y

Y W

f

g

g q

Figure 2.2. Average number of path extensions per coded bit performedby the stack algorithm with the sub-ML supercode heuristic hS.

20


31/81

2.6 Summary

Despite the limitation of sequential decoding to rates below cutoff rate, general

tree search decoding can finish decoding in a small number of steps if a reliableheuristic path metric is used. We have presented two such heuristics based on

the concept of a supercode, both of which can be precomputed on a trellis of the

supercode before the actual tree search begins. Although this preprocessing step

requires performing additional computations, many more computations are saved

during the tree search phase, especially at rates above the cutoff rate.

21


32/81

CHAPTER 3

SOFT-OUTPUT EQUALIZATION WITH THE M-BCJR ALGORITHM

3.1 Introduction

Efficient communication over channels introducing inter-symbol interference

(ISI) often requires the receiver to perform channel equalization. Turbo equaliza-

tion [18] is a technique in which decoding and equalization are performed itera-

tively, similar to turbo-decoding of serially-concatenated convolutional codes [3].

As depicted in Figure 3.1, the key element of the receiver employing this method

is a soft-input soft-output (SISO) demodulator/equalizer (from now on referred

to as just an equalizer), accepting a priori likelihoods of coded bits from the SISO

decoder, and producing their a posteriori likelihoods based on the noisy received

signal.

The SISO algorithm that computes the exact values of the a posteriori likeli-

hoods is the BCJR algorithm [2]. The complexity of a BCJR equalizer is propor-

tional to the number of states in the trellis representing the modulation alphabet

and the ISI, and thus it is exponential in both the length of the channel impluse

response (CIR) and in the number of bits per symbol in the modulator. This canbe a serious drawback in some scenarios, e.g., transmission at a high data rate over

a radio channel, where a large signal bandwidth translates to a long CIR, and a

high spectral efficiency translates to a large modulation alphabet. Needed in such

22


33/81

cases are alternative SISO equalizers with the ability to achieve large complexity

savings at a cost of small performance degradation.

There have been two main trends in the design of such SISOs. The first one

relies on reducing the effective length of the channel impulse response, either by

linear processing (see, e.g., [38]), or interference cancellation via decision feed-

back. A particularly good algorithm is this category is the reduced-state BCJR

(RS-BCJR) [5], which performs the cancellation of the final channel taps on a

per-survivor basis. Iterative decoding with RS-BCJR is very stable, thanks to the

high quality of the soft outputs, but the receiver cannot use the signal power con-

tained in the cancelled part of the CIR. Another trend is to adapt hard-output

sequential algorithms [1] to produce soft outputs. Examples in this category are

the M-BCJR and T-BCJR algorithms [9], based on the M- and T-algorithms, and

the LISS algorithm [12] based on list sequential decoding. These algorithms have

no problem using the signal energy from the whole CIR, and offer much more

flexibility in choosing the desired complexity. However, their reliance on ignoring

unpromising paths in the trellis or tree causes a bias in the soft output (there are

more explored paths with one value of a particular input bit than another), which

negatively affects the convergence of iterative decoding.

In this paper we present a new SISO equalization algorithm, inspired by both

the M-BCJR and RS-BCJR, which shares many of their advantages, but few

of their weaknesses. We call this algorithm the M-BCJR algorithm, since it

resembles the M-BCJR in preserving only a fixed number of trellis states with

the largest forward metric. Instead of deleting the excess states, however, the

M-BCJR dynamically merges them with the surviving states a process that

shares some similarity to the static state merging done on a per-survivor basis

23


34/81

by the RS-BCJR. For the sake of simpler notation, we present the operation of

all BCJR-based algorithms, including the M-BCJR, in the probability domain.

Each of them, however, can be implemented in the log domain for better numerical

stability.

The rest of the paper is structured as follows. Section 2 describes the commu-

nication system and the task of the SISO equalizer and introduces the notation.

Section 3 reviews the structure of the BCJR, M-BCJR, and RS-BCJR algorithms,

helping us to introduce the M-BCJR in Section 4. Section 5 presents simulation

results, and conclusions are given in Section 6.

3.2 Communication system

A communication system with turbo equalization is depicted in Figure 3.1. The

information bits are first arranged into blocks and encoded with a convolutional

code. The blocks of coded bits are permuted using an interleaver and mapped onto

a sequence of complex symbols by the modulator. (In general, the modulator can

have memory, but for simplicity we will assume a memoryless mapper.) Thechannel acts as a discrete-time finite impulse response (FIR) filter introducing

ISI, the output of which is further corrupted by additive white Gaussian noise

(AWGN). We assume the receiver knows the ISI channel coefficients and the noise

variance, and it attempts to recover the information bits by iteratively performing

SISO equalization and decoding.

The part of the system significant from the point of view of the equalizer is

shown in Figure 3.2. Let a = (a1, a2,...,aL) denote a sequence ofLK bits entering

the modulator, arranged into L groups ai = (a1i , a

2i ,...,a

Ki ) of K bits. Each K-

tuple ai selects a complex-valued output symbol xi from a constellation of size 2K

24


35/81

Convolutional

EncoderModulator

SISO

Demodulator

/ Equalizer

SISO

Decoder

Inter-Symbol

Interference

AWGN

Figure 3.1. Communication system with turbo equalization.

Modulator ISI(h, , h

)

AWGN

Figure 3.2. Part of the system to be soft-inverted by the SISOequalizer.

to be transmitted. The sequence of symbols y = (y1, y2,..., yL+S) obtained at

the receiver is modeled as

yi =S

j=0

hjxij + ni, (3.1)

where S is the memory of the channel, hj, j = 0, 1,...,S, are the channel coef-

ficients, and ni, i = 1, 2,...,L + S, are i.i.d. zero-mean complex-valued Gaussian

random variables with variance 2 per complex dimension. Equation (3.1) assumes

that xi is zero outside i = 1, 2,...,L.

The SISO equalizer for the above channel takes the received symbols y andthe a priori log-likelihood ratios La(a

ki ) for each bit a

ki , defined as

La(aki ) = log

P(aki = +1)

P(aki = 1), (3.2)

25


36/81

and outputs the a posteriori L-values L(aki )

L(aki ) = logP(aki = +1|y)P(aki =

1

|y)

. (3.3)

The values actually fed to the SISO decoder are extrinsic L-values, computed as

Le(aki ) = L(a

ki ) La(aki ).

Let (a) denote the joint probability that a was transmitted and y was re-

ceived. Then (3.3) can be expressed as

L(aki ) = log a:aki=+1 (a)

a:aki=1 (a)

, (3.4)

where the summations are performed over all a consistent with aki = 1. Further-more,

(a) = P(a)L+Si=1

122

exp( 12

||ri S

j=0

hjxij||2), (3.5)

where hj, j = 0, 1,...,S, and 2 are assumed known at the receiver and P(a) is

obtained from La as

P(a) =Li=1

Kk=1

P(aki ), (3.6)

with

P(aki = 1) =exp(La(aki ))

1 + exp(La(aki )). (3.7)

Since the number of paths involved in the summations of (3.4) is extrememly

large for realistic values of K and L, a practical algorithm seeks to simplify or

approximate this calcualtion.

26


37/81

3.3 SISO equalization

3.3.1 The BCJR algorithm

The classical algorithm for efficiently computing (3.4) by exploiting the trellis

structure of the set of all paths is the BCJR algorithm [2]. By defining the state

si at time i as the past S input symbol K-tuples ai, si = (ai1,...,aiS), and a

branch metric (si, ai) as

(si, ai) = P(ai)1

22exp( 1

2||ri

Sj=0

hjxij ||2), (3.8)

the path metric can be factored into

(a) =L+Si=1

(si, ai). (3.9)

For indices outside the range i = 1,...,L, the variables ai are regarded as empty

sequences with P(ai = ) = 1.

For every trellis branch bi

= (si, a

i, s

i+1) starting in state s

i, labeled by input

bits ai, and ending in state si+1, the BCJR algorithm computes the sum of the

path metrics (a) over all paths passing through this branch as

a:bi

(a) = (si)(si, ai)(si+1). (3.10)

The computation of the forward state metrics (si) is performed in the forward

recursion for i = 1, 2,...,L + S 1:

(si+1) =

bi=(si,ai,si+1)

(si)(si, ai), (3.11)

27


38/81

with the initial state value (s1) = 1. Similarly, the backward recursion computes

the backward state metrics (si) for i = L + S, L + S 1,..., 2:

(si) =

bi=(si,ai,si+1)

(si, ai)(si+1), (3.12)

with the terminal state value (sL+S+1) = 1. With all s, s, and s com-

puted, the summations over paths in (3.4) can be replaced by the summations

over branches,

L(aki ) = log

bi:a

ki=+1

(si)(si, ai)(si+1)

bi:aki=1(si)(si, ai)(si+1)

. (3.13)

The completion phase, in which (3.13) is evaluated for every aki , concludes the

algorithm.

The complexity of the BCJR equalizer is proportional to the number of trellis

states, 2KS. The following subsections describe the operation of the RS-BCJR [5]

and M-BCJR [9] algorithms, which preserve the general structure of the BCJR,

but instead operate on dynamically built simplified trellises with a number of

states controlled via a parameter. In the original form of both algorithms, the

construction of this simplified trellis occurs during the forward recursion and is

based on the values of the forward state metrics, while the backward recursion

and the completion phase just reuse the same trellis.

3.3.2 The RS-BCJR algorithm

The way we will describe the operation of the RS-BCJR algorithm is slightly

different from the presentation in [5], but is in fact equivalent.

28


39/81

Let us consider two states in the trellis,

si = (ai1, ...,aiS, aiS1,...,aiS), (3.14)

si = (ai1, ...,aiS, aiS1,...,a

iS), (3.15)

differing only in the last SS binary K-tuples. Furthermore, consider two partialpaths beginning in states si and s

i and corresponding to the same partial input

sequence a[i,L] = (ai,...,aL). Both paths are guaranteed to merge after S S

time indices, and hence their partial path metrics are

(si, a[i,L]) =i+SS1

j=i

(sj,aj)L

j=i+SS

(sj, aj), (3.16)

(si, a[i,L]) =i+SS1

j=i

(sj , aj)L

j=i+SS

(sj, aj). (3.17)

Additionally, close examination of (3.8) reveals that the difference between (sj, aj)

and (sj , aj) for j = i,...,i + SS 1 is not large. Hence, the difference between

(si, a) and (si, a), for a[i,L], is also not large.

The RS-BCJR equalizer relies on the above observation and, for some prede-

fined S, declares states differing only in the last S S binary K-tuples indis-tinguishable. Every such set of states is subsequently reduced to a single state,

by selecting the state with the highest forward metric and merging all remaining

states into it. Here, we define merging of the state si into si as updating the

forward metric (si) := (si) + (si), redirecting all trellis branches ending at si

into si, and deleting si from the trellis. This reduction is performed during the

forward recursion, and the s for the paths that originate from removed states

need never be computed. The trellis that results has only 2KS

states, compared

29


40/81

to 2KS in the original trellis. The same trellis is then reused in the backward

recursion and the completion stage.

The RS-BCJR equalizer is particularly effective when the final coefficients of

the ISI channel are small in magnitude. Furthermore, the reduced-state trellis

retains the same branch-to-state ratio (branch density) and has the same number

of branches with aki = +1 and aki = 1 for any i and k properties that ensure

a high quality for the soft outputs and good convergence of iterative decoding.

Unfortunately, the RS-BCJR algorithm cannot use the signal power in the final

SS channel taps, effectively reducing the minimum Euclidean distance between

paths. Moreover, the number of surviving states can only be set to a power of 2K,

which could be a problem for large K (e.g., for a system with 16QAM modulation,

equalization using 16 states could result in poor performance, while 256 states

could exceed acceptable complexity).

3.3.3 The M-BCJR algorithm

The M-BCJR algorithm is based on the M-algorithm [1], originally designed

for the problem of maximum likelihood sequence estimation. The M-algorithm

keeps track only of the M most likely paths at the same depth, throwing away

any excess paths. In the M-BCJR equalizer this idea is applied to the trellis

states during the forward recursion. At every level i, when all (si) have been

computed, the M states with the largest forward metrics are retained, and all

remaining states are deleted from the trellis (together with all the branches that

lead to or depart from them). The same trellis is then reused in the backward

recursion and completion phase.

In [9] it was shown that the M-BCJR algorithm performs well when the state

30


41/81

reduction ratio 2KS/M is not very large. Also, unlike the RS-BCJR algorithm, it

can use the power from all the channel taps. For small M, however, the reduced

trellis is very sparse, i.e., the branch-to-state ratio is much smaller than in the full

trellis and there is often a disproportion between the number of branches labeled

with aki = +1 and aki = 1 for any i and k. These factors reduce the quality of the

soft outputs and the convergence performance and may require an alternative way

of computing the a posteriori likelihoods (like the Bayesian estimation approach

presented in [23]). Finally, the M-BCJR algorithm requires performing a partial

sort (finding the M largest elements out of M2K) at every trellis section, which

increases the complexity per state.

3.4 The M-BCJR algorithm

In this section we demonstrate how the concept of state merging present in

the RS-BCJR equalizer can be used to enhance the performance of the M-BCJR

algorithm. We call the resulting algorithm the M-BCJR algorithm.

During the forward recursion the M

-BCJR algorithm retains a maximum ofM states for any time index i. Unlike the M-BCJR algorithm, however, the excess

states are not deleted, but merely merged into some of the surviving states. This

means that none of the branches seen so far are deleted from the trellis, but they

are just redirected into a more likely state. The forward recursion of the algorithm

can be described as follows:

1. Set i := 1. For the initial trellis state s1, set (s1) := 1. Also, fix the set of

states surviving at depth 1 to be S1 := s1.

2. Initialize the set of surviving states at depth i +1 to an empty set, Si+1 = .

31


42/81

3. For every state si in the set Si, and every branch b = (si, ai, si+1) originating

from that state, compute the metric (si, ai), and add si+1 to the set Si+1.

4. For every state si+1

in Si+1

compute the forward state metric as a sum of

(si)(si, ai) over all branches b = (si, ai, si+1) visited in step 3 that end in

si+1.

5. If the number of states in Si+1 is no more than M, proceed to step 8.

Otherwise continue with step 6.

6. Determine the M states in Si+1 with the largest value of the forward state

metric. Remove all remaining states from Si+1 and put them in a temporary

set Si+1.

7. Go over all states si+1 in the set Si+1 and perform the following tasks for

each of them:

- Find a state si+1 in Si+1 that differs from si+1 by the least number of

final K-tuples aj.

- Redirect all branches ending in si+1 to si+1.

- Add (si+1) to the metric (si+1).

- Delete si+1 from the set Si+1.

8. Increment i by 1. Ifi L + S 1, go to step 2. Otherwise the forwardrecursion is finished.

The merging of si into si in step 7 is also illustrated in Figure 3.3. The

backward recursion and the completion phase are subsequently performed only

over states remaining in the sets Si and only over visited branches (i.e., branches

for which the metrics were calculated in step 3).

32


43/81

Figure 3.3. Trellis section a) before and b) after merging an excess statesi into a surviving state si.

Just as for the M-BCJR, the M-BCJR algorithm can use the power from all

channel taps and offers full freedom in choosing the number of surviving states

M. At the same time, the M-BCJR never deletes visited branches, and hence it

retains the branch density of the full trellis and avoids a disproportion between

the number of branches labeled with aki = +1 and aki = 1. As a result, the

soft outputs generated by the M-BCJR equalizer ensure good convergence of

the iterative receiver. Complexity-wise, the algorithm requires some additional

processing per state (due to step 7) and some additional memory per branch (the

ending state must be remembered for each branch). However, if we regard the

calculation of the branch metrics as the dominant operation, the complexities

of the M-BCJR, RS-BCJR, and M-BCJR equalizers are the same for fixed M =

2KS

.

3.5 Simulation results

To evaluate the performance of the M-BCJR equalizer, we considered two

turbo-equalization systems. Both systems used a recursive, memory 5, rate 1/2

terminated convolutional code as an outer code. The first system used BPSK

33


44/81

TABLE 3.1

SIMULATED TURBO-EQUALIZATION SCENARIOS

Scenario 1 Scenario 2

Outer code CC(2,1,5) CC(2,1,5)

Modulation BPSK 16QAM

Channel memory S 4 2

CIR {0.45, 0.25, {1, 1, 1}{h0,...,hS}

0.15,

0.1,

0.05}

BCJR states 16 256

Interleaver size 1024 4096

No. of iterations 6 6

modulation and a 5-tap channel (maximum 16 states), and a block of 507 infor-

mation bits (size 1024 DRP [6] interleaver). The second system used 16QAM

modulation, but only a 3-tap channel (maximum 256 states), and a block of 2043

information bits (size 4096 DRP interleaver). The remaining parameters and the

channel impulse responses are summarized in Table 3.1.

Both systems were simulated with the M-BCJR and RS-BCJR equalizers,

for several values of M and S. In each case we allowed the receiver to perform

6 iterations. The bit error rates Pe for a range of Eb/No (average energy per

bit over noise spectral density) are plotted in Figure 3.4. To better illustrate the

complexity-performance tradeoffs achievable with both algorithms, we also plotted

the number of states M or 2KS

against the Eb/No needed to achieve certain Pe

(104 for system 1 and 103 for system 2) in Figure 3.5.

The simulations demonstrate the superior performance of the M-BCJR equal-

izer. In scenario 1, the M-BCJR equalizer with 3 states outperforms the RS-

34


45/81

a)

! "

#$

% & ' &

#(

) 0

0

'

1

#(

) 0

2

3

1

#(

) 0

0

'

1

#(

) 0

4

5 6

7

8

7

9

6

@ 6

7

8

7

9

6

A 6

7

8

7

9

6

2

3

1

# ( ) 0

4

5 6

7

8

7

9

6

B 6

7

8

7

9

6

@ 6

7

8

7

9

6

C 6

7

8

7

9

6

b)

D E F G H I P Q

P P PR

PQ

S

T

PQ

S

U

PQ

S

V

P Q

S W

PQ

X

Y

a

b

c d

e

f g

h

i

hp

q r

rs

t

hp

q r

d u

v w

x

y

x

w i

t

hp

q r

d

v

u

u

v

w

x

y

x

wi

Figure 3.4. Bit error rate of M-BCJR and RS-BCJR for a) scenario 1(BPSK) and b) scenario 2 (16QAM).

35


46/81

a)

! " #

$ %

& ' (

)

0

1

23

4

5

6 2

7

8

4

5 6 2

b)

9 @ A B C D C C C E

C

E

F

G

H

9

@

A

I

P

Q

R

S T

U

V W

X

Y

a b c d e

f g

h i p

q

r

s

tu

v

Xw

x t

y

v

Xw

x t

Figure 3.5. Number of states vs. Eb/No to reach the reference Pe for a)scenario 1 (BPSK, Pe = 10

4) and b) scenario 2 (16QAM, Pe = 103).

36


47/81

BCJR with 8 states by 0.1 dB for Pe below 104. When both algorithms use 4

states, the M-BCJR equalizer offers a 0.7 dB gain compared to the RS-BCJR.

In scenario 2, the M-BCJR with 16 states achieves almost a 3 dB gain over the

RS-BCJR with the same number of states.

3.6 Summary

We have examined the problem of complexity reduction in turbo equalization

for systems with large constellation sizes and/or long channel impulse responses.

We have defined the operation of merging one state into another and used it to give

an alternative interpretation of the RS-BCJR algorithm. Finally we modified the

M-BCJR algorithm, replacing the deletion of excess states by the merging of these

states into the surviving states. The resulting algorithm, called the M-BCJR

algorithm, was shown to generate reduced-complexity trellises more suitable for

SISO equalization than those obtained by the RS-BCJR and M-BCJR algorithms.

Simulation results demonstrated very good performance for turbo-equalization

systems employing the M

-BCJR, exceeding that of the RS-BCJR even with muchsmaller complexities.

37


48/81

CHAPTER 4

SERIAL CONCATENATIONS WITH SIMPLE BLOCK INNER CODES

4.1 Introduction

Serially concatencated codes (SCCs) [3] are one of the error control techniques

that offers good error protection and efficient decoding using iterative turbo

decoders [4]. A rate RORI SCC encoder first collects a block of information bits

and encodes it using a rate RO outer code. The resulting intermediate coded

bit sequence is permuted using an interleaver and subsequently encoded using a

rate RI inner code. At the receiver, decoding is implemented using soft-input

soft-output (SISO) decoders for each of the component codes, where the extrinsic

information about the intermediate sequence is iteratively exchanged between the

two decoders.

When an SCC is used to communicate over a binary-input additive white

Gaussian noise (AWGN) channel, its performance is typically characterized by

the average bit error rate (BER) as a function of the ratio Eb/N0 of the energy

per information bit to the one-sided noise power spectral density. When plotted,

the BER curve shows three distinct regions. The region of very low Eb/N0 is char-acterized by high error rates resulting from the inability of the iterative decoder to

converge. In the region of high Eb/N0, called the error floor region, the iterative

decoder almost always converges to the minimum-distance codeword, performing

38


49/81

nearly maximum likelihood (ML) decoding after just a few iteratons. Finally, in

the middle Eb/N0 region, called the waterfall region and characterized by a rapid

drop in the BER, iterative decoding converges only for some received sequences,

and a large number of iterations may be required to approach the ML solution.

An SCC with good performance is characterized by the waterfall region located

at a low Eb/N0 and the error floor region located at a low BER. The standard de-

sign tools that provide good predictions about the performance of an SCC in the

error floor and waterfall regions are uniform interleaver analysis [3] and extrinsic

information transfer (EXIT) charts [35], respectively.

In this paper we consider an SCC with an inner block code, as illustrated in Fig.

4.1. This is in contrast to the usual practice of using recursive convolutional codes

as inner codes, since inner block codes provide no asymptotic interleaver gain [3],

i.e., the error floor does not decrease indefinitely with increasing interleaver size.

However, for moderate and fixed interleaver sizes this is not a serious drawback.

Suppose that the outer code has a minimum output weight dOmin and the inner

block code has a minimum output weight dI1 corresponding to an input sequence

with weight one. Then, as long as the interleaver is able to spread the low weight

outer codewords in such way that every nonzero bit is placed in a separate inner

block, the minimum distance of the SCC can be as high as dOmindI1. Based on this

straightforward observation, we can summarize our design criteria for a good SCC

as follows:

choose the outer code with a large dOmin,

choose the inner block code with a large dI1, and

the outer and inner codes should have well-matched EXIT characteristics.

39


50/81

(n,k,m) CC

Encoder

(outer)

GSPC

Encoder

(inner)

Binary input

AWGN

InnerSISO

OuterSISO

Figure 4.1. Serially concatenated coding with an inner block code.

+

+

Figure 4.2. Generalized single parity check encoder.

Perhaps the simplest block encoder that provides a large dI1 can be obtained by

modifying a single parity check (SPC) code as illustrated in Fig. 4.2. The encoderfor this (K, L) generalized single parity check (GSPC) code computes a parity bit

for the K information bits and then adds it modulo 2 to the first L information

bits. Clearly, an input sequence with weight 1 produces an output sequence with

weight L + 2 or L, depending on the bit location. Despite its simple structure,

we show in the following sections that an SCC utilizing such an inner code can

perform very well in both the waterfall and error floor regions of the BER curve.

The rest of the paper is organized as follows. Section 2 presents a SISO decoder

for the GSPC code. Section 3 derives bounds, based on the uniform interleaver

analysis, on the ML-decoding performance of the SCC. Section 4 examines the

40


51/81

relation between the parameters K and L of the GSPC code and the shape of its

EXIT curve. Section 5 presents simulation results for designed SCCs utilizing a

GSPC code. Finally, some conclusions are drawn in Section 6.

4.2 Soft-output decoding of the GSPC code

A SISO decoder for an SPC code accepts a priori L-values La(un) for each

information bit un, n = 1,...,K, channel L-values L(xn) for each coded bit xn,

n = 1,...,K+ 1, and produces extrinsic L-values Le(un). These L-values are

respectively defined as

La(un) = logPr{un = 0}Pr{un = 1} ,

L(xn) = logPr{xn = 0}Pr{xn = 1} ,

Le(un) = logPr{un = 0|x1, . . ,xK+1}Pr{un = 1|x1, . . ,xK+1} La(un).

Soft-output decoding of SPC codes has been thoroughly studied in the litera-

ture in the context of product codes [13], low density parity check (LDPC) codes

[37], repeat-accumulate codes [36], and others. Despite the similaties between

SPC and GSPC codes, such as having identical codebooks for even L (but dif-

ferent input-output mappings), the techniques commonly used for decoding SPC

codes (e.g., the operation in [13]) cannot be easily generalized to GSPC codes.

Instead we propose to perform SISO decoding using the special 4-state trellis

illustrated in Fig. 4.3. The trellis has K sections, with the first L being of type I

and the remaining KL of type II. The trellis state (ln, gn) at time n = 0, 1,...,Kconsists of two bits, the local parity bit generated by all input bits preceding a

41


52/81

given trellis section (ln =n

m=1 um) and the global parity bit generated by all

input bits (gn =K

m=1 um). Consistency requires that only (l0, g0) = (0, 0) and

(l0, g0) = (0, 1) are valid starting states and only (lK, gK) = (0, 0) and (lK, gK) =

(1, 1) are valid ending states. A trellis branch exists between states (ln1, gn1)

and (ln, gn) if gn1 = gn. Each branch is labeled with an (un, xn) pair, which

depends on the starting state, ending state, and the trellis section type. For type

I sections un = ln1 + ln and xn = ln1 + ln + gn, while for type II sections

un = xn = ln1 + ln.

The extrinsic L-values for the input bits can be obtained by performing a

BCJR-like processing [2] on the trellis defined above. First, we compute the

branch metrics n(ln1, ln, gn) for every connected pair of states (ln1, gn1 = gn)

and (ln, gn) as

n(ln1, ln, gn) =(1)un

2La(un) +

(1)xn2

L(xn).

Then we recursively obtain the forward state metrics n(ln, gn) and the backward

state metrics n(ln, gn) as

n(ln, gn) = max(n1(0, gn) + n(0, ln, gn),

n1(1, gn) + n(1, ln, gn)),

n(ln, gn) = max(n+1(0, gn) + n+1(ln, 0, gn+1),

n+1(1, gn) + n+1(ln, 1, gn+1)).

42


53/81

The initial values for the state metrics are

0(0, 0) = 0(0, 1) = 0,

0(1, 0) = 0(1, 1) = ,

K(0, 0) =1

2L(xK+1),

K(1, 1) = 12

L(xK+1),

K(0, 1) = K(1, 0) = ,

where the infinite values imply invalid states and the values assigned to the back-

ward state metrics account for the parity bit xK+1. The final extrinsic L-values

Le(un) are then computed as

Le(un) =

maxun=0

(n1(ln1, gn) + n(ln1, ln, gn) + n(ln, gn))

maxun=1

(n1(ln1, gn) + n(ln1, ln, gn) + n(ln, gn)),

where the max operations are performed over all branches (ln1, ln, gn) with either

un = 0 or un = 1.

4.3 Bounds on ML performance of SCCs with an inner GSPC code

It is possible to obtain accurate bounds on the BER performance of linear codes

under ML decoding that depend on the code structure entirely via the multiplici-

ties of codewords of given input and output weights. Additional techniques, such

as a uniform interleaver analysis, offer approximations to the codeword weight

spectrum of an SCC based entirely on the weight spectra of the inner and outer

43


54/81

! "

#

#

!

$

$

$

%

! "

#

&

!

$

$ $

'

'

'

'

(

(

)

)

! "

#

#

!

$

'

'

'

'

(

(

)

)

! "

#

#

!

$

$ $

0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 4 0 1 2 3 4 4

5 6

7

6 5

7

5

8 9

7

8 9

5

8

7

8

5

8 @

7

8 @

5

'

9

7

'

9

5

'

7

'

Figure 4.3: Trellis for soft-output decoding of the generalized parity check code.

code [3]. In this section we will apply these techniques to SCCs with an inner

GSPC code.

Let AO(W, H) denote the input-output weight enumerating function (IOWEF)

of the outer code, defined as

AO(W, H) =w=0

h=0

AOw,hWwHh,

where AOw,h denotes the number of codewords with input weight w and output

weight h. By analogy, let AI(H, X) and AC(W, X) denote the IOWEFs of the

inner code and the SCC, respectively. If the IOWEF of the SCC is known, the

BER under ML decoding is upper-bounded by

Pb(Eb/N0) x=0

w=1

wACw,x

Q(xRoEb/N0).

The exact computation of the AC

(W, X), however, is infeasible for most practical

codes. However, approximations to AC(W, X) can still yield meaningful bounds,

as long as they accurately predict the multiplicities of the low weight codewords.

Below we obtain two such approximations using different assumptions about the

44


55/81

interleaver: an idealized interleaver and a uniform interleaver. Both approaches

share the convenient property of depending on the structure of the outer code

only via its weight enumerating function.

4.3.1 An idealized interleaver

Consider an interleaver that is capable of spreading every codeword of the outer

code in such a way that every bit equal to one is mapped to a separate block of

the inner code. Of course, except for pathological cases, such an interleaver would

be impossible to construct. However, it is feasible to construct one that spreads

out all the low-weight codewords that determine the performance of the SCC for

medium and large Eb/N0. Hence, this idealized interleaver can provide a useful

asymptotic approximation to performance for very large interleaver sizes.

When a (K, L) GSPC code is used to encode an input sequence with a single

un = 1 and all remaining uk = 0, k = n, the resulting codeword has Hammingweight L if n L and L + 2 if n > L. Assuming that on average a fraction L/Kof the input K-tuples will have n

L, we can say that an outer codeword with

weight h will result in a concatenated codeword with weight h(L + 2(K L)/K).Hence

Pb(EbN0

) h=0

w=1

wAOw,h

Q(h(L + 2(K L)/K)Ro Eb

N0).

An example where a practical interleaver might provide performance close to the

idealized interleaver is an SCC with a convolutional outer code and an S-random or

linear interleaver, since the low-weight codewords are usually generated by single

error events.

45


56/81

4.3.2 A uniform interleaver

The uniform interleaver is a probabilistic device that maps any input sequence

to each of its possible permutations with equal probability. For an interleaver size

of NK (see Fig. 4.1), a codeword of weight h generated by the outer encoder

can be permuted intoNKh

distinct and equiprobable bit sequences. Then the

IOWEF coefficients of the SCC can be computed as

ACw,x =h=0

NK

h

1AOw,hA

I(h, x).

In our case the inner code, of input length NK, consists ofN independent blocks

of length K. The IOWEF of the N block codes can be obtained from the IOWEF

AP(H, X) for a single block as

AI(H, X) =

AP(H, X)N

.

AP(H, X) for a single (K, L) GSPC code can be obtained from the conditional

weight enumerating functions AP(h, X), which we define separately for odd and

even h. For an input sequence with even weight h we clearly have

APeven(h, X) =

K

h

Xh.

The situation is more complicated for odd h, since the output weight depends on

how many of the 1 bits are placed in the first L positions. In this case we obtain

APodd(h, X) =

min(h,L)i=

max(0,L+hK)

L

i

K Lh i

XL+h+12i.

46


57/81

The IOWEF AP(H, X) then equals

AP(H, X) =

h=0 H2hAPeven(2h, X)

+h=0

H2h+1APodd(2h + 1, X).

The resulting AP(H, X) can then be used in the previous equations to yield the

desired bound.

4.3.3 Comparison with simulation results

In Fig. 4.4 we present a comparison between the actual BER curves and

the bounds obtained using idealized and uniform interleavers, and we note good

agreement between the theoretical and experimental results. The SCC consists

of a rate 1/2, 2-state convolutional outer code (CC) with generator polynomial

G(D) = [1, 1 + D], a 4096-bit dithered relative prime (DRP) interleaver [6], and

an inner GSPC code with K = 8 and L = 0, 2, 4, 6, 8.

4.4 EXIT chart analysis for GSPC codes

As a complement to the error floor analysis of the previous section, the EXIT

charts proposed by ten Brink [35] can be used to predict the convergence threshold

of iterative decoding, and hence the performance in the waterfall region. EXIT

charts characterize the average mutual information between the information or

coded bits and their extrinsic L-values at the output of a SISO decoder as a func-

tion of the mutual information carried by the a priori L-values. When these mutual

information transfer curves for the inner and outer codes are plotted against each

other for a certain value ofEb/N0, the lowest value of Eb/N0 for which the curves

47


58/81

0 1 2 3 4 5 610

10

108

106

104

102

100

Eb/N

o(in dB)

Biterrorra

te

(K,L)=(8,0)(K,L)=(8,2)(K,L)=(8,4)(K,L)=(8,6)(K,L)=(8,8)idealized interleaveruniform interleaver

Figure 4.4. Comparison of simulation results and BER bounds for the

2-state, rate 1/2 outer convolutional code.

do not cross corresponds to the convergence threshold of the iterative decoder.

The EXIT curves for GSPC code for some selected values of K and L are plot-

ted in Fig. 4.5. The relation between the curve shapes and the code parameters

can be summarized in the following observations:

The value ofIE(IA = 1) depends directly on the parameter L (the higher L,the higher IE(IA = 1)).

The value ofIE(IA = 0) also depends on L, (the higher L, the lower IE(IA =0)).

The steepness of the IE(IA) curve in the neighborhood of IA = 1 dependson K (the higher K, the steeper the curve).

48


59/81

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

IA(u)

IE(u)

(K,L)=(8,0)(K,L)=(8,4)(K,L)=(8,8)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

IA(u)

IE

u

(K,L)=(8,8)(K,L)=(16,16)(K,L)=(32,32)

Figure 4.5. Dependence between the GSPC code parameters and theEXIT curve shape.

4.5 Design examples

In this section we present two SCCs with an inner GSPC code that were

designed using the following steps. First, an outer code with a good tradeoff

between minimum distance and SISO decoding complexity was selected. The

second step involved choosing the parameter K of the inner GSPC code. Smaller

K allows for larger N for the same interleaver size, which in turn improves the

process of spreading low-weight outer codewords into separate inner code blocks,

making the system perform closer to the idealized interleaver bound. On the other

hand, too small a value of K leads to a reduced overall code rate and limits the

range of the parameter L that can be selected. We found that the useful values

for K are in range of 8 to 24. Finally, the value of L was chosen so that the outer

and inner EXIT charts match as closely as possible.

The parameters of the two SCCs are as follows. We selected the outer codes

to be rate 1/2 convolutional codes GSCC1 = [27o, 31o] and GSCC2 = [561o, 753o],

which are the 16-state and 256-state codes with the largest possible free distance

[19]. The inner codes that provided the best match turned out be (K, L) = (12, 8)

49


60/81

and (K, L) = (16, 8) GSPC codes, respectively. Furthermore, we used a 4032-bit

dithered relative prime (DRP) interleaver [6]. The EXIT charts illustrating the

degree of matching achieved are plotted in Fig. 4.6, while the corresponding BER

curves and the uniform interleaver bounds are plotted in Fig. 4.7. As can be

observed, both schemes perform about 1.5 dB away from the Shannon limit at

a BER 106 with 16 iterations and about 1.3 dB away with 32 iterations, and

according to the uniform iterleaver bounds, both codes show potentially very low

error floors at a BERs of 1010 (SCC1) and 1015 (SCC2). In addition, the EXIT

charts in Fig. 4.6 indicate that waterfall region performance within 0.3 dB of the

Shannon limit can be obtained with larger interleaver sizes.

4.6 Conclusions

In this paper we showed that SCCs with simple inner block codes can achieve

very good performance in both the waterfall and error floor regions of the BER

curve. We proposed a simple block inner code providing a large value of d1 and

a SISO algorithm needed for iterative decoding. We examined their propertiesusing both a uniform interleaver analysis and EXIT charts, and two example

SCCs exhibiting very promising performance were designed.

50


61/81

a)0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IA(u)

IE(u)

Outer 16state R=1/2 CCInner GSPC (K,L)=(12,8)

Eb/N

0= 0.5 dB

b)0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IA(u)

IE(u)

Outer 256state R=1/2 CCInner GSPC (K,L)=(16,8)

Eb/N

0= 0.5 dB

Figure 4.6. EXIT charts for a) SCC 1 with 16-state outer code, b) SCC2 with 256-state outer code.

51


62/81

a)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 510

12

1010

108

106

104

102

100

Eb/N

o(in dB)

Biterrorrate

16state CC + GPC (12,8,1)Uniform interleaver bound

Shannon limitfor BPSKat R=6/13E

b/N

0 0.0 dB

1 iter.

2 iter.

4 iter.8 iter.

16 iter.32 iter.

b)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 510

16

1014

1012

1010

BCJR Material Text

Documents

Transcript of BCJR Material Text