Polar Codes for Identification Systems1232884/... · 2018-07-13 · Registreringsfasen dr data om...

IN DEGREE PROJECT ELECTRICAL ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2018

Polar Codes for Identification Systems

LINGHUI ZHOU

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

TRITA TRITA-EECS-EX-2018:103

ISSN : 1653-5146

www.kth.se

Abstract

Identification systems are ubiquitous, for example, biometric identification sys-tems with fingerprints and Face IDs, etc. Basically, the identification problemconsists of two steps. The enrollment phase where the user’s data are captured,compressed and stored, for example taking the fingerprint or capturing some im-portant features of your face. In the identification phase, an observation, yourfingerprint or your face, is compared with the stored information in the databaseto provide an affirmative answer. Since the system involves many users, bothstoring and searching for the correct user is challenging.

This project aims to implement compression and identification algorithms forthe high dimensional identification system which includes M users. Polar codesare employed to be the main toolbox. Firstly, we implement polar codes for thesource compression and then design corresponding identification mappings. Thesource compression can be seen as the channel decoding of polar codes. In theidentification phase, the observation can be seen as the side information, so wewill consider using Wyner-Ziv coding for polar codes to reconstruct and iden-tify. In the next step, we will implement polar codes for two-layer Wyner-Zivcoding for identification systems. This will enable us to store the compresseddata in separate databases and do the reconstruction in two stages. With theenrollment mapping and identification mapping implemented, we will evaluatethe performance of the designed identification systems, such as identificationerror rate and complexity. Some possible further directions would be to imple-ment more advanced algorithms such as simplified or fast simplified successivecancellation encoding in source coding and universal decoding in identification.

Sammanfattning

Identifieringssystem frekommer verallt, till exempel, biometriska identifieringssys-tem med fingeravtryck och ansiktsigenknning, etc. Fundamentalt kan problemetbrytas ned i tv faser. Registreringsfasen dr data om anvndaren insamlas, kom-primeras och lagras, till exempel att ta fingeravtryck eller finna viktiga ansik-tsdetaljer. I identifieringsfasen, jmfrs en observation, ditt fingeravtryck elleransikte, med information som lagrats tidigare fr att ge ett positivt svar. Efter-som systemet hanterar mnga anvndare r bde lagring och skning efter den rttaanvndaren utmanande.

Syftet med detta projekt r att designa och implementera effektiva komprimerings-och identifieringsalgoritmer fr det hgdimensionella Identifieringssystemet medM anvndare. Polar codes anvnds som det huvudsakliga verktyget. Frst imple-menterar vi polar codes fr effektiv kllkomprimering och designar sedan motsvarandeidentifieringskartlggning. Kllkomprimering kan ses som kanalavkodningen avpolar codes och I identifieringsfasen kan observationen ses som sido-informationen,s vi vervger att anvnda Wyner-Ziv kodning fr polar codes fr att rekonstrueraoch identifiera. I nsta steg implementerar vi polar codes fr skra Wyner-Zivproblem. Detta tillter oss att spara komprimerad data i separata databaser ochterskapa med tv steg. Med registreringskarlggning och identifieringskartlggningimplementerade utvrderar vi prestandan av de designade identifieringssystemenmed metoder som felfrekvens av identifieringar och berkningskomplexitet.

Acknowledgment

Firstly, I would like to express my sincerest gratitude to my examiner, TobiasOechtering, Asst. Prof. at the Department of Information Science and En-gineering of Royal Institute of Technology (KTH), who provided me with theopportunity to do this master thesis and supervising it. I would like to thank mysupervisor, Minh Thanh Vu, for his supervision, valuable advice and patiencethroughout this master thesis project. Finally, I would like to thank my familyand friends for their constant support and encouragement.

List of Symbols andAbbreviations

Symbol DefinitionX random variableX alphabetx realization of X|X | cardinality of the alphabet XW (y|x) channel WC(W ) capacity of channel WI(W ) symmetric capacity of channel WZ(W ) Bhattacharyya parameter

W(i)N bit channel

WN N channel usesWN vector channel of size Nlog() logarithm of base 2h2(p) binary entropy function, −p log p− (1− p) log(1− p)O(N) asymptotic complexity of NM number of users in an identification systemα ∗ β α(1− β) + β(1− α)Ber(p) Bernoulli distribution with expectation pAbbreviations DefinitionB-DMC Binary Discrete Memoryless ChannelBEC Binary Erasure ChannelBSC Binary Symmetric ChannelLR Likelihood RatioLLR Log Likelihood RatioSC Successive Cancellation decoderSCL Successive Cancellation List decoderRV Random VariableMMI maximum mutual informationML maximum likelihoodl.c.e. lower convex envelopeiff if and only ifi.i.d. independent and identically distributed

3

Contents

1 Indroduction 61.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Societal Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Introduction of Identification Systems . . . . . . . . . . . . . . . 71.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Polar Codes for Channel Coding 82.1 Polarization Basics . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Binary Input Channels . . . . . . . . . . . . . . . . . . . . 92.1.2 Binary Discrete Memoryless Channel . . . . . . . . . . . . 9

2.2 Channel Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.1 Basic Channel Transform . . . . . . . . . . . . . . . . . . 102.2.2 Recursive Channel Transform . . . . . . . . . . . . . . . . 112.2.3 Channel Polarization . . . . . . . . . . . . . . . . . . . . . 13

2.3 Code Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Polar Codes Achieve Channel Capacity . . . . . . . . . . . . . . . 162.5 Decoding Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5.1 Successive Cancellation Decoder . . . . . . . . . . . . . . 172.6 Successive Cancellation List Decoding . . . . . . . . . . . . . . . 182.7 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Polar Codes for Source Coding 213.1 Source Coding Basics . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Successive Cancellation Encoder . . . . . . . . . . . . . . . . . . 223.3 List based SC Encoder . . . . . . . . . . . . . . . . . . . . . . . . 233.4 Simulation Results and Discussion . . . . . . . . . . . . . . . . . 23

4 Polar Codes with Side Information 284.1 Wyner-Ziv Problem . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Two-layer Wyner-Ziv Coding . . . . . . . . . . . . . . . . . . . . 31

4.2.1 Two-layer Polar Coding . . . . . . . . . . . . . . . . . . . 314.2.2 Two-layer Wyner-Ziv Encoding . . . . . . . . . . . . . . . 334.2.3 Two-layer Wyner-Ziv Decoding . . . . . . . . . . . . . . 34

4.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 384.3.1 One-layer Polar Codes for Wyner-Ziv Problem . . . . . . 384.3.2 Two-layer Polar Codes for Wyner-Ziv Problem . . . . . . 38

4

5 Identification System 415.1 Model of Identification System . . . . . . . . . . . . . . . . . . . 415.2 Polar Codes for Identification Systems . . . . . . . . . . . . . . . 42

5.2.1 Basic Identification Systems . . . . . . . . . . . . . . . . . 425.2.2 Wyner-Ziv Scenario Based Identification Systems . . . . 435.2.3 Two-layer Identification Systems . . . . . . . . . . . . . . 445.2.4 Two-layer Identification System with Pre-processing . . . 44

5.3 Simulation Results and Discussion . . . . . . . . . . . . . . . . . 455.3.1 One-layer Polar Codes for Identification Systems . . . . . 455.3.2 Two-layer Polar Codes for Identification Systems . . . . . 46

5.4 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Conclusion and Future Work 516.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Bibliography 52

5

Chapter 1

Indroduction

1.1 Motivation

The issue of biometrical identification has raised considerable awareness in thelast few decades. In [1], an introduction to biometric identification systems wasgiven. Biometric identification systems, which use physical features to identifyindividuals, ensure greater security than the traditional identification strategies.The most common traditional identification methods are passwords, keys, elec-tric tokens, and cards. It happens that passwords can be forgotten and keys orcards can be lost or stolen. However, the physical features of human are uniquefor each individual and not likely to change in a period of time. The mostcommon physical features are the face, fingerprint, voice ,iris, hand geometry,etc. In [2], a comparison between these five biometrics was given. Accordingto different usage of applications and biometric features’ characteristics, we canmatch a specific biometric feature to an application [3]. However, different fromthe traditional identification methods, the implementation of biometric identifi-cation systems requires to store the biometric data of the users and reconstructbased on the database. In this work, we will be interested in finding an efficientcompression mechanism and reconstruction method.

Polar codes, recently proposed by Arikan [6], are proved to be the firstcodes that achieve the capacity of the binary-input discrete memoryless chan-nels (B-DMCs). However, the results obtained at short sequence length are notsatisfying. It was shown in [7], [8], with the list based successive cancellationdecoding, polar codes achieve better performance at short sequence length. In[10], polar codes are also proved to be optimal for lossy source coding. In [16],it was shown that polar code is also optimal for the Wyner-Ziv scenario. Polarcodes for two-layer Wyner-Ziv coding was discussed in [20]. In this project, wewill use polar codes for the source compression and reconstruction in an identi-fication system. We will also discuss how the list based successive cancellationinfluence the performance of source coding. In addition, we will consider imple-menting polar codes for two-layer Wyner-Ziv coding, which will generate twoseparate databases.

6

1.2 Societal Impact

Identification system turns out to play increasingly critical role in our soci-ety. As a result, more accurate and faster identification becomes a crucial task.Biometric identification system tends to be adopted without limit, as it is imple-mented by both private organizations and governmental institutes, regardlessof political or economic structure, size or geography. It was estimated that thebiometrics market will increase from 12.84 billion dollars in 2016 to 29.41 billiondollars by 2022 [4]. The biometric identification systems turns out to play moreimportant role in many areas.

1.3 Introduction of Identification Systems

Biometric identification systems, which use physical features to identify indi-viduals, ensure better security than password or numbers. Some of the mostcommon and best-known features are the face, fingerprints, voice, irises, etc.Generally, a biometric identification system involves two phases. The first phase,enrollment phase, the physical features of the observed individuals are quantizedand stored in the database. In the identification phase, a noisy version of thebiometrical data from an unknown individual is observed. The observed data iscompared to the enrolled data in the database and decide which user is observed.

Consider in an identification system there might be a large number of in-dividuals involved, it might be difficult to store the original data. It becomesnecessary to compress the data efficiently. Possible solutions are data mining,efficient data compression mechanism and storing data in several devices sep-arately. In this thesis, we will focus on the second and the third aspects. Inaddition, we will also think about implementing corresponding identificationmappings.

1.4 Thesis Outline

The report is organized as follows.

• In Chapter 2, we introduce the basics of polar codes, including the channelpolarization and transformation. The successive cancellation decoding forpolar codes channel coding will also be discussed.

• In Chapter 3, we introduce the polar codes for source coding. Two en-coders will be applied, successive cancellation and list based successivecancellation encoder.

• In Chapter 4, we discuss the polar codes for the Wyner-Ziv problem. Thetwo-layer Wyner-Ziv problem will also be discussed.

• In Chapter 5, we consider the model for an identification system andimplement the polar codes for data compression as well as reconstruction.

• In Chapter 6, we will briefly discuss the conclusions, challenges and futurework on polar codes for an identification system.

7

Chapter 2

Polar Codes for ChannelCoding

In this chapter, we will discuss the basics of polar codes for channel coding.This is based on the work of Arikan [6].

Polar code construction is based on the following transformation. Giventhe input UN1 , implement the encoding operation XN

1 = UN1 GN , and let xN1transmit through the N copies of a B-DMC W . The transformation matrix GNis defined as:

GN = G⊗n2 RN ,

where G⊗n2 is the nth Kronecker power of G2 and RN is the bit-reversal permu-tation matrix. The matrix RN can be interpreted as the bit-reversal operator:there is, if vN1 = uN1 RN , then vb1,··· ,bn = ubn,··· ,b1 . The nth Kronecker power ofG2 is defined as

G⊗n2 = G12 ⊗G⊗n−1

2 =

[G⊗n−1

2 0G⊗n−1

2 G⊗n−12

]. (2.1)

Here, the base is G12 =

[1 01 1

].

Next, apply the chain rule to the mutual information between the input UN1and Y N1 , there is

I(UN1 ;Y N1 ) =

N∑i=1

I(Ui;YN1 |U i−1

1 ) =

N∑i=1

I(Ui;YN1 , U i−1

1 ).

The essential observation of polar codes is that with block size N increases,the terms in the summation either approach 0 or 1. This phenomenon is referredas channel polarization.

In the following sections, we will give more details about polar codes.

8

2.1 Polarization Basics

2.1.1 Binary Input Channels

Assume X is the field of size two and Y is an arbitrary set. X and Y are theinput and output alphabets of a channel W . Denote the channel as W : X → Y.Then the probability of observing Y = y ∈ Y when the input is X = x ∈ X is

Pr{Y = y|X = 0} = W (y|0) and Pr{Y = y|X = 1} = W (y|1). (2.2)

2.1.2 Binary Discrete Memoryless Channel

Among the binary input channels, binary discrete memoryless channel (B-DMC)is an important class of channels in information theory. We write WN to denotethe channel corresponding to N independent uses of channel W ; therefore, WN :XN → YN with WN (yN1 |XN

1 ) =∏Ni=1W (yi|xi).

Given a B-DMC W , we can measure the rate and reliability of W by deter-mining two parameters, the symmetric capacity and the Bhattacharyya param-eter.

Definition 1. The symmetric capacity is defined as

I(W )∆=∑y∈Y

∑x∈X

1

2W (y|x) log

W (y|x)12W (y|0) + 1

2W (y|1), (2.3)

The symmetric capacity I(W ) equals the Shannon capacity when W is asymmetric channel. A channel is symmetric if there exist a permutation π suchthat for each output symbol y there is W (y|1) = W (π(y)|0). Two examplesof symmetric channels are the binary symmetric channel (BSC) and the binaryerasure channel (BEC).

A BSC with crossover probability pe is a B-DMC W with output alphabetY = {0, 1}, W (0|0) = W (1|1) = 1 − pe and W (1|0) = W (0|1) = pe. TheShannon capacity for a BSC(pe) is

C(BSC(pe)) = 1− h2(pe),

where h2 is the binary entropy function and h2(pe) = −pe log (pe) − (1 −pe) log (1− pe).

A BEC with erasure probability pe is a B-DMC if there isW (0|0) = W (1|1) =1− pe and W (e|0) = W (e|1) = pe, where e is the erasure symbol. The Shannoncapacity for a BEC(pe) is

C(BEC(pe)) = 1− pe.

Another important parameter is the Bhattacharyya parameter, which is de-fined as follows.

Definition 2. The Bhattacharyya parameter is defined as

Z(W )∆=∑y∈Y

√W (y|0)W (y|1). (2.4)

9

In the code construction of polar codes, we will focus on the Bhattacharyyaparameters. It is an upper bound on the maximum likelihood (ML) decisionerror probability [6]. More details about the properties of the Bhattacharyyaparameter the relationship with block error probability will be discussed in thelater sections.

2.2 Channel Transform

In this section, we will discuss the channel transform and the transformation ofI(W ) and Z(w). Firstly, we introduce the basic level transform, then we extendit to recursive transform.

2.2.1 Basic Channel Transform

Let X = {0, 1}, W : X → Y be a B-DMC, and U21 a random vector that is

uniformly distributed over X 2. Consider the following channel combining oftwo channels as depicted in Figure 2.1.

U1 + W

U2 W

Y1

Y2

X1

X2

Figure 2.1: The basic channel transform

Denote the input of the channel with X21 = U2

1G2, and Y 21 the corresponding

outputs. We have the transition probabilities

W2(y21 |u2

1)∆=

2∏i=1

W (yi|xi), (2.5)

where

G2 =

[1 01 1

]. (2.6)

The channel combining here implies how two individual channels W aretransformed to a new channel W2 : X 2 → Y2. Since the transform betweenU2

1 and X21 is linear and bijective, we have the mutual information between the

input U21 and output Y 2

1

I(U21 ;Y 2

1 ) = I(X21 ;Y 2

1 ) = 2I(W ). (2.7)

Now we split the mutual information above by applying chain rule:

I(U21 ;Y 2

1 ) = I(U1;Y 21 ) + I(U2;Y 2

1 |U1) = I(U1;Y 21 ) + I(U2;Y 2

1 , U1). (2.8)

10

For the term I(U1;Y 21 ), we can interpret it as the mutual information be-

tween the input U1 and the output Y 21 , and the input U2 seen as noise. We

denote this ”channel” by W(1)2 .

Similarly, the term I(U2;Y 21 , U1) can be seen as the mutual information

between the input U2 and the output Y 21 with U1 known. Similarly, we denote

this ”channel” by W(2)2 .

Based on this, we can write (W,W ) 7→ (W 12 ,W

22 ) for any given B-DMC

channel W with

W(1)2 (y2

1 |u1)∆=∑u2

1

2W2(y2

1 |u21) =

∑u2

1

2W (y1|u1 ⊕ u2)W (y2|u2) (2.9)

W(2)2 (y2

1 , u1|u2)∆=

1

2W2(y2

1 |u21) =

1

2W (y1|u1 ⊕ u2)W (y2|u2) (2.10)

For simplicity, we define the following notations for the above channel trans-formations. Given any B-DMC W : X → Y, we have

(W�∗W

)(y1, y2|u1)

def=

1

2

∑u2∈X

W (y1|u1 ⊕ u2)W (y2|u2) (2.11)

(W~W

)(y1, y2, u1|u2)

def=

1

2W (y1|u1 ⊕ u2)W (y2|u2) (2.12)

For any B-DMCW , the transformation (W,W ) 7→ (W 12 ,W

22 ) is rate-preserving

and moves the symmetric capacity away from the center in the sense that

I(W(1)2 ) + I(W

(2)2 ) = 2I(W ), (2.13)

I(W(1)2 ) ≤ I(W ) ≤ I(W

(2)2 ). (2.14)

and the Bhattacharyya parameters of this transformation satisfies

Z(W(1)2 ) ≤ 2Z(W )− Z(W )

2, (2.15)

Z(W(2)2 ) = Z(W )

2. (2.16)

The equality holds only when W is a BEC [6].

2.2.2 Recursive Channel Transform

In this section, we will explain how the channel combining works at the higherlevels. And when the size is large enough, the channels would be polarized toeither completely clean or noisy channel.

The next level, second level (n = 2) of channel transform is illustrated inFigure 2.2.

The mapping u41 7→ x4

1 from the input of W4 to W 4 can be written asx4

1 = U41G4, where

11

U1 + + W

U2

U3

U4

W

W

W

+

+

Y1

Y2

Y3

Y4

•

• •

•

X1

X2

X3

X4

Figure 2.2: Second level channel transform

G4 = R4G⊗22 =

1 0 0 01 0 1 01 1 0 01 1 1 1

. (2.17)

Then we have the transformation of transition probabilities W4(y41 |u4

1) =W 4(y4

1 |u41G4). This operation can be generalized to the higher level in a recur-

sive manner. Define channel WN : XN → YN , do the channel combining

WN (yN1 |uN1 ) = WN/2(yN/21 |uN1,e ⊕ uN1,o)WN/2(yNN/2+1|u

N1,e), (2.18)

where uN1,o = (u1, u3, ..., uN−1) and uN1,e = (u2, u4, ..., uN ). For the channelsplitting, apply the chain rule as before,

I(UN1 ;Y N1 ) =

N∑i=1

I(Ui;YN−11 , U i−1

1 ).

I(Ui;YN−11 , U i−1

1 ) can be seen as the mutual information between Ui and (Y N−11 , U i−1

1 ).

Denote this channel by W(i)N , for which the transition probability is

W(i)N (yN−1

1 , ui−11 |ui)

∆= P (yN−1

1 , ui−11 |ui).

For any n ≥ 0, N = 2n, 1 ≤ i ≤ N , we have

W(2i−1)2N (y2N

1 , u2i−21 |u2i−1)

=∑u2i

1

2W

(i)N (yN1 , u

2i−21,o ⊕ u

2i−21,e |u2i−1 ⊕ u2i)W

(i)N (y2N

N+1, u2i−21,e |u2i)

(2.19)

12

W(2i)2N (y2N

1 , u2i−21 |u2i−1)

=1

2W

(i)N (yN1 , u

2i−21,o ⊕ u

2i−21,e |u2i−1 ⊕ u2i)W

(i)N (y2N

N+1, u2i−21,e |u2i)

(2.20)

The above channels can be denoted by

W 2iN = W

(i)N/2�∗W

(i)N/2,

W 2i+1N = W

(i)N/2~W

(i)N/2,

For I(W(i)N ) and Z(W

(i)N ) at higher level, we have

I(W(2i−1)N ) ≤ I(W

(i)N/2) ≤ I(I(W

(2i)N )),

I(W(2i−1)N ) + I(W

(2i)N ) = 2I(W

(i)N/2),

and

Z(W(2i−1)N ) ≤ 2Z(W

(i)N/2)

2− Z(W

(i)N/2),

Z(W(2i)N ) = Z(W

(i)N/2)

2.

The equality holds when the channel W is BEC [6]. For this special case thatW is a BEC with erasure probability e, the Bhattacharyya parameter could becalculated recursively

Z(W(2i−1)N ) = 2Z(W

(i)N/2)

2− Z(W

(i)N/2), (2.21)

Z(W(2i)N ) = Z(W

(i)N/2)

2. (2.22)

with initiation Z(W(1)1 ) = e.

Let PB denote the block error probability. Then PB can be upper boundedas given in [6]

PB ≤∑

Z(W(i)N ). (2.23)

There is also an important property of Bhattacharyya parameter for de-graded channel. Consider two B-DMC channels W and W ′, suppose W � W ′,

then there is W(i)N �W ′

(i)N and Z(W

(i)N ) ≥ Z(W ′

(i)N ) [14, Lemma 4.7].

2.2.3 Channel Polarization

In the previous section, we have discussed the channel transformation from N

copies of channel W to polarized ”channels” {W (i)N }Ni=1. Figure 2.3 illustrates

the result of polarization for the case W is a BEC with erasure probabilitype = 0.5. The bit channel capacities are calculated using the recursion

I(W(2i)N ) = 2I(W

(i)N/2)

2− I(W

(i)N/2), (2.24)

I(W(2i−1)N ) = I(W

(i)N/2)

2. (2.25)

with initiation I(W(1)1 ) = 1− pe. This recursive relation follows from equations

(2.19), (2.20) and the fact that I(W(i)N ) = 1−Z(W

(i)N ) for a BEC W . Note that

13

0 100 200 300 400 500 600

Bit channel index

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Ca

pa

city

Capacity of bit channels

Figure 2.3: I(W(i)N ) for i = 1, · · · , 512 for a BEC(0.5)

this recursion is valid for BECs, calculation for general B-DMCs is not knownyet.

Figure 2.3 shows that I(W(i)N ) tends to approaches 0 for smaller indices and

approaches 1 fro larger indices. It was proved in [6], if the block length N is

sufficiently long enough, I(W(i)N ) will either approach 0 or 1. This is implied

from the following theorem [6].

Theorem 2.1. For any B-DMC W , the channels {W (i)N } polarizes in the sense

that, for any fixed δ ∈ (0, 1), as N goes to infinitely through powers of two, the

fraction of indices i ∈ {1, · · · , N} for which I(W(i)N ) ∈ (1 − δ, 1] goes to I(W )

and the fraction for which I(W(i)N ) ∈ [0, δ) goes to 1− I(W ).

2.3 Code Construction

We use the polarization effect for code construction. The idea of polar coding is

to send data only through the channels for which Z(W(i)N ) approach 0. Figure

2.4 illustrates the code construction for polar codes with block length N = 8,K = 4 and assuming the channel is W = BEC( 1

2 ). According to the discussionin Section 2.2.2, the Bhattacharyya parameters for a BEC can be calculateddirectly with the equations (2.21) and (2.22). Rank the Bhattacharyya param-

eters, and select the channels with least Z(i)N as information bits, which are U4,

U6, U7 and U8.Therefore the code construction problem for polar codes can be seen as

finding the Bhattacharyya parameters Z(W(i)N ). To construct a (N,K) polar

code, we firstly calculate the Z(W(i)N ) for each i ∈ {1, · · · , N} and divide the

indices to two parts, free set F c and frozen set F . We use the indices belongingto F c to transmit information and fix the indices belonging to F to some known

14

U1 + + W

U2

U3

U4

U5

U6

U7

U8

W

W

W

W

W

W

W

+

+

+

+

+

+ Y1

Y2

Y3

Y4

Y5

Y6

Y7

Y8

+ +

+

+

•

• •

•

•

•

•

•

•

• •

•

frozen

frozen

frozen

data

frozen

data

data

data

Rank

8

7

6

4

5

3

2

1

Z(Wi)

0.9961

0.8789

0.8086

0.3164

0.6836

0.1914

0.1211

0.0039

Figure 2.4: Code construction for polar codes with N = 8, K = 4, W = BEC( 12 )

values, usually 0.

For BECs, the transformation of Z(W(i)N ) and I(W

(i)N ) are known, so the

code construction can be precisely described. While for other channels, thecode construction problem is more complex. Arikan proposed to use Monte-Carlo method for estimating the Bhattacharyya parameters [6]. Firstly, generatesamples of (UN1 , Y

N1 ) with the given distribution. Then find the empirical means

{Z(W(i)N )}, which is estimated as the expectation of the RV√

W iN (Y N1 , U i1 − 1|Ui ⊕ 1)

W iN (Y N1 , U i1 − 1|Ui)

. (2.26)

An successive cancellation (SC) decoder can be used for this computation be-cause the RV is the square root of the decision statistics. The details of SCdecoding will be introduced in the next section.

Algorithm 1 describes how to estimate ZN for a BSC with pre-definedcrossover probability p.

15

Algorithm 1 The Monte-Carlo estimation

Input: Sequence length N , the crossover probability p, Monte-Carlo itera-tions Runs

Output: The estimated Z(W(i)N ), with i = 1, 2, · · · , N

1: Z = zeros(N, 1)2: for r = 1 : Runs do3: x = randn(N, 1)4: y = bsc(x, p)5: for i = 1 : N do6: l = LLR

(i)N (yN1 , x

i−11 |xi)

7: if l ≥ 0 then

8: Z(W(i)N ) =

Z(W(i)N )∗(r−1)+e−

12l

r9: else

10: Z(W(i)N ) =

Z(W(i)N )∗(r−1)+e

12l

r

11: return Z

2.4 Polar Codes Achieve Channel Capacity

In the previous sections, we have seen how the channels are polarized and takethe advantage of the polarization for code construction. According to Theorem2.1, the fraction of ”clean” channels tends to approach I(W ), therefore theachievable rate is close to I(W ).

Recall in the channel splitting, we define the bit channel W(i)N with respect

to the mutual information term I(Ui;YN−11 , U i−1

1 ). To define such a channel,the decoder should have access to U i−1

1 and the output Y N1 . Therefore considerusing the SC decoder which decodes in order U1, · · · , UN . In this way, thedecoder will have an estimation of U i−1

1 when decoding Ui. Based on this idea,Arikan proposed SC decoding based on computing likelihood ratio (LR) [6],

L(i)N (yN1 , u

i−11 )

∆=W

(i)N (yN1 , u

i−11 |0)

W(i)N (yN1 , u

i−11 |1)

and generates decision as(a) If i ∈ F , then set ui = ui.

(b) If i ∈ F c, then calculate L(i)N and set

ui =

{0 L

(i)N (yN1 , u

i−11 ) ≥ 1

1 otherwise

As stated in equation 2.23, the block error probability PB can be upperbounded as

PB ≤∑i∈F c

Z(W(i)N ).

To let the block error probability PB reduce to sufficiently small or vanish,

the Bhattacharyya parameter Z(W(i)N ) for i ∈ F c should approach 0. In [17],

Arikan and Telatar obtained the following result which gives the rate of Z(W(i)N )

approaching 0.

16

Theorem 2.2. Given a B-DMC W and any β < 12 , there is

limn→∞

Pr(Z(W(i)N ) ≤ 2−N

β

for i ∈ {1, · · · , N}) = I(W ). (2.27)

In [6], Arikan proved polar codes achieve the symmetric capacity.

Theorem 2.3. Given a B-DMC W and fixed rate R < I(W ), for any β < 12

there exists a sequence of polar codes of rate RN < R such that the block errorprobability

PN = O(2−Nβ

).

This can be proved with the following code construction [14]. For any 0 <β < 1

2 and ε > 0. Choose the frozen set Fas

F = {i : Z(W(i)N >

1

N2−N

β

}.

Theorem 2.3 implies that for sufficiently large enough block length N , there is

|F c|N≥ I(W )− ε.

The block error probability of this scheme with SC decoding is

PB(F ) ≤∑i∈F c

Z(W(i)N ) ≤ 2−N

β

.

Therefore Theorem 2.3 proved.

2.5 Decoding Algorithms

2.5.1 Successive Cancellation Decoder

We have discussed SC decoding in the previous section. In this section, wewill study the details of SC decoding. Recall the SC decoding is realized bycalculating the likelihood ratios (LR). The LRs can be calculated using therecursive formulas in equation 2.18, which gives

L(2i−1)N (yN1 , u

2i−21 ) =

L(i)N/2(y

N/21 , u2i−2

1,o ⊕ u2i−21,e )L

(i)N/2(yNN/2+1, u

2i−21,e ) + 1

L(i)N/2(y

N/21 , u2i−2

1,o ⊕ u2i−21,e ) + L

(i)N/2(yNN/2+1, u

2i−21,e )

,

and

L(2i)N (yN1 , u

2i−11 ) = L

(i)N/2(y

N/21 , u2i−2

1,o ⊕ u2i−21,e )]1−2u2i−1 · L(i)

N/2(yNN/2+1, u2i−21,e ).

This calculation can be recursively reduced to block length 1 with initiation

L(1)1 (yi) = W (yi|0)/W (yi|1), which can be found directly with the output se-

quence y and channel parameter.To avoid doing a large amount of multiplication calculation, do the successive

cancellation in the logarithm domain. In logarithm domain, the above algorithmwill become

(a) If i ∈ F , then set ui = ui.

17

(b) If i ∈ F c, then calculate L(i)N and set

ui =

0 ln(W

(i)N (yN1 ,u

i−11 |0)

W(i)N (yN1 ,u

i−11 |1)

) ≥ 0

1 otherwise

For simplicity, denote LLR(i)N/2(y

N/21 , u2i−2

1,o ⊕u2i−21,e ) with LLR1 and LLR

(i)N/2(yNN/2+1, u

2i−21,e )

with LLR2.Then in log domain, the recursive calculation becomes

LLR(2i−1)N (yN1 , u

2i−21 ) = 2 tanh−1(tanh(

LLR1

2) tanh(

LLR2

2)), (2.28)

LLR(2i)N (yN1 , u

2i−11 ) = (−1)u2i−1LLR1 + LLR2 =

{LLR2 + LLR1 u2i−1

1 = 0

LLR2− LLR1 u2i−11 = 1

(2.29)Using proper approximation [27], equation (2.28) can be approximated to

LLR(2i−1)N (yN1 , u

2i−21 ) = sgn(LLR1)sgn(LLR2) min(|LLR1|, |LLR2|)

It is obvious that the recursive form in the logarithm domain is much simpler,which only consist of summation and sign operation operators. Algorithm 2describes a successive cancellation decoder.

Algorithm 2 Successive cancellation decoder

Input: Received vector y with length N , frozen set FOutput: Decoded sequence u

1: u = zeros(N, 1)2: for doi = 1 : N3: if i ∈ F then4: ui = 05: else6: l = LLR

(i)N (yN1 , u

i−11 )

7: if l ≥ 0 then8: ui = 09: else

10: ui = 1

11: x = uGN12: return x

2.6 Successive Cancellation List Decoding

The main drawback of SC decoder is that once there is a wrong decision made,it can not be corrected. In order to avoid this problem and improve the perfor-mance, an improved version of SC decoding, the Successive Cancellation List(SCL) decoder was introduced to approach the maximum likelihood decoderwith an acceptable complexity [7], [8].

Similar to SC decoder, the SCL decoder also uses the recursion calculationto make the decision. However, instead of only making one decision, the SCL

18

•

•

•

•

•

0

1

0

0

• •

• • • • • • • • • • • • •

• • • •

• • •

•

• •

• •

• • •

0

1

1 10

1

0 1

0

10

1

10 1

0

0 10 1

1

0 1 0 1 0

Figure 2.5: An example of decoding tree for SC decoding

decoder reserve L most likely paths. At the end of the decoding stage, the mostlikely path is selected as the final output.

The successive cancellation decoding can be seen as a searching process ona tree. Figure 2.5 illustrates the example of successive cancellation decodingwith block length N = 4. Here for simplicity, we are assuming all the bits areinformation bits, which means at each node we have to find the correspondinglikelihood and make the decision. The nodes on the bold branches representsthe nodes which are visited, and the rest of the nodes are not visited. And forthe nodes visited, only the path which goes to the leaf is remained. Thereforein this example, the path with bold branches that goes to the leaf gives thedecision u4

1 = 0100. However, there is only one decision path made at eachlevel and the final decoding path is not guaranteed to be the most probable one.Once there is an wrong bit determined, there is no chance to corrected in thefurther decoding process.

•

• •

•

• •

• •

•0

0 1

1

0 1

0 1

10

10

0 1

• • •

• •

• • • • • • • • • • • • •

• • • •

• • •

0

10

1

10

0

0 10 1

1

0 1 0 1

Figure 2.6: An example of decoding tree for SCL decoding.

Due to limitation of SC decoder, the successive cancellation list decoder(SCL) was proposed. Unlike SC decoder only make one decision path, the SCLdecoder allows at most L (list size) candidate paths to be further explored ateach level.

In a SCL decoder, at each level, the decoder doubles the number of candidatepath by appending a bit 0 or 1 to the candidate paths, and then selects the Lmost probable paths. The decoder repeat this process at the further level untilit reaches the leaf nodes.

19

Figure 2.6 illustrates the decoding for the list based successive cancellationwith block length N = 4 and list size L = 2. Similar to the SC decoding treein Figure 2.5, assume all the bits are information bits. The bold branches arethe paths that decoding has visited. As Figure 2.6 shows, during the processof SCL decoding, there are two possible paths at each level remained. And atthe next level, the four subbranches are concerned, while only two remain. Atthe final level in this example, we have two decisions 0100 and 1111. Thereforunlike SC decoding, once there is an error in decoding, it is possible to correctthis error by keeping several paths.

2.7 Complexity Analysis

One of the advantages of polar codes is that the asymptotic complexity isO(N logN) in both encoding and decoding.

First consider the complexity for encoding. For the basic transformation asshown in Figure 2.1, there is only one XOR operation. Let XE(N) denotes thecomplexity for encoding of block length N . According to the recursive channeltransformation in encoding, the complexity for encoding is [14]

XE(N) =N

2+ 2XE(

N

2) =

N

2+ 2(

N

4+ 2XE(

N

4)) = · · · = N

2logN.

So the encoding complexity is O(N logN).Now consider the decoding complexity and we use the similar idea when

analyzing the encoding complexity. Firstly, the decoding complexity for blocklength 2 is O(1). Then with the result in Section 2.5.1, the decoder works re-cursively and there are only summation or sign operation in decoding. Similarlylet XD(N) denotes the complexity of decoding with block length N . Then thereis

XD(N) = N + 2XD(N

2) = N + 2(

N

2+ 2XD(

N

4)) = · · · = N logN.

So the decoding complexity is O(N logN).As for the complexity of SCL decoding, at each node, the decoder has to

decode at each path. So when the list size is L, the decoding complexity isO(LN logN).

20

Chapter 3

Polar Codes for SourceCoding

In this previous chapter, we discussed polar codes for channel coding. Here inthis chapter, we will look at polar codes for lossy source coding. The purpose ofchannel coding is to code source for transmission over a noisy communicationchannel, while source coding is to code data to more efficiently representedinformation. In [10], it was shown that polar codes are optimal for lossy sourcecoding. Here, we will introduce how to implement polar codes for source coding.

3.1 Source Coding Basics

There are two different classes of source coding, lossless source coding andlossy source coding. Here, we only consider lossy source coding, which achievesgreater compression.

Firstly, we give definition the Hamming distortion. Consider a binary sym-metric source and let d(·, ·) denote the Hamming distortion, we have

d(0, 0) = d(1, 1) = 0, d(0, 1) = d(1, 0) = 1.

In Shannon’s rate-distortion theorem [12], the minimum rate required un-der a fixed distortion is given. Consider an i.i.d. source Y with probabilitydistribution PY (y), to achieve the average distortion D the required rate is atleast

R(D) = minp(y,x):Ep[d(y,x)]≤D

p(y)=PY (y)

I(Y ;X).

Moreover, for any R ≥ R(D) there exist a sequence of code CN and functionsfN and gN such that |CN | ≤ 2NR and the average distortion DN approachesD.

If we put a constraint that the distribution over X must be uniform, thenthe above function becomes the symmetric rate-distortion function, which is

Rs(D) = minp(y,x):Ep[d(y,x)]≤D,p(y)=PY (y),p(x)= 1

|X|

I(Y ;X).

21

Obviously, we have Rs(D) ≥ R(D).The importance of symmetric rate-distortion function is that, for any R ≥ Rs(D), there exit a code at most Rachieve an average distortion. And for a uniformly distributed source, we haveRs(D) = R(D).

It was proved polar codes achieve the symmetric distortion bound [10], whichwill be discussed in the following sections.

3.2 Successive Cancellation Encoder

In channel coding, the SC operation is implemented at the decoder. While insource coding, SC operation is implemented at the encoder to map the sourcevector to the codeword and we call this operation SC encoding.

A SC encoder is given as follows. Consider a source of i.i.d.realization ywith length N , and suppose u(y, uF ) is the output of the following encodingoperation with polar code CN (F, uF ). Given y, for any i ∈ {1, · · · , N}

(a) If i ∈ F , then set ui = ui.

(b) If i ∈ F c, then calculate L(i)N (yN1 , u

i−11 ) and set

ui =

0 w.p.L

(i)N

1+L(i)N

1 w.p. 1

1+L(i)N

It was proved in [10] that the value of frozen bits does not influence the rate-distortion performance, so we can put the frozen bits to either 0 or 1. Thedecoding of this operation is given by x = uG⊗n2 . Then the average distortionof this scheme is D = 1

NE[d(Y,X)].Note that in the SC decoding, unlike the maximum likelihood (ML) rule in

SC decoding, ui is chosen randomly with a probability related to L(i)N . This

decision rule is referred as random rounding. Here, the probabilityL

(i)N

1+L(i)N

can

be rewritten as

L(i)N

1 + L(i)N

=

∑uNi+1

WN (yN1 |ui−11 , 0, uNi+1)∑

uNiWN (yN1 |u

i−11 , uNi )

=

∑uNi+1

PU |Y (ui−11 , 0, uNi+1|yN1 )∑

uNiPUN1 |Y N1 (y|ui−1

1 , uNi )|yN1= PUi|Ui−1

1 ,Y N1(0|ui−1

1 , yN1 ).

Therefor, the probabilityL

(i)N

1+L(i)N

can be interpreted as the posterior of Ui ==

given (Y = yN1 , Ui−11 = ui−1

1 ) under the distribution P .The algorithm of successive cancellation encoder is below in Algorithm 3.

22

Algorithm 3 Successive cancellation encoder

Input: Source vector y with length N , frozen set FOutput: Compressed sequence u

1: u = zeros(N, 1)2: for i = 1 : N do3: x = randn(N, 1)4: if i ∈ F then5: ui = 06: else7: l = LLR

(i)N (yN1 , u

i−11 )

8: L = 11+el

9: r = randn(1, 1)10: if r ≤ 1− L then11: ui = 012: else13: ui = 1

14: return u

3.3 List based SC Encoder

As we see, the encoding for source coding is the same as the decoding for channelcoding using polar codes. Intuitively, we can implement SCL encoding in sourcecoding.

Just like polar codes in channel coding, once there is an error in decoding,it cannot go back to correct this error. List based decoding keep the sequenceswith the highest probability and can somehow avoid such error. Similarly, wecan implement SCL encoder in source coding, which generates a compressedsequence better describe the source data.

3.4 Simulation Results and Discussion

In this section, we discuss the performance of polar codes for source coding.Consider a binary symmetric source Y , and the test channel is binary symmetricchannel with crossover probability D (BSC(D)). The distribution of the outputX induced by the test channel is also binary symmetric. The rate distortionfunction for this case is R(D) = 1−h2(D), where h2(·) is the binary symmetricfunction. Since the frozen bits does not influence the performance, we set thefrozen bits to 0 for simplicity.

Figure 3.1 show the performance of polar codes for lossy source coding com-bined with random rounding. Here we considered block length 128, 256,512 and1024. As seen in Figure 3.1, the larger the block length is, the closer the pointsmove to to the rate-distortion bound. This result is consistent with that givenin [10].

Figure 3.2 compares the performance of SCL encoder and SC encoder forblock length (a) 64, (b) 128, (c) 256 and (d) 512. Here, we also assume abinary symmetric source Y , the test channel BSC(D) and the correspondingrate distortion function is R(D) = 1− h2(D). It can be seen that list based SC

23

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

D

Polar Codes for Source Coding

Rate-distortion function

Blocklength128

Blocklength256

Blocklength512

Blocklength1024

Figure 3.1: The rate-distortion performance of polar codes for source coding.

encoder performs better than SC encoder. And there is the so-called diminishingreturn when increasing the list size.

Here, the rate-distortion pairs in Figure 3.2 are also provided in Table 3.1.As the list size is larger than 4, the performance tends to converge. This ob-

servation was also mentioned in [11]. Also worth noting is that the performanceof different encoders seems to converge at higher rate R.

24

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

D

Polar Codes for Source Coding(Blocklength64)


SC

SCL4

SCL8

SCL16

(a) Block length 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

D



SC

SCL4

SCL8

SCL16

(b) Block length 128

25

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

D



SC

SCL4

SCL8

SCL16

(c) Block length 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

D



SC

SCL4

SCL8

SCL16

(d) Block length 512

Figure 3.2: SCL vs SC encoder for block length 64, 128, 256, 512.

26

DR

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

D = h−12 (1−R) 0.3160 0.2430 0.1893 0.1461 0.1100 0.0794 0.0532 0.0311 0.0130SC 0.3594 0.2856 0.2323 0.1823 0.1428 0.1108 0.0781 0.0553 0.0216

SCL4 0.3519 0.2772 0.2248 0.1767 0.1381 0.1083 0.0775 0.0547 0.0216SCL8 0.3519 0.2772 0.2248 0.1767 0.1372 0.1083 0.0775 0.0547 0.0216SCL16 0.3519 0.2772 0.2245 0.1767 0.1372 0.1083 0.0775 0.0547 0.0216

(a) Block length N = 64

DR

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

D = h−12 (1−R) 0.3160 0.2430 0.1893 0.1461 0.1100 0.0794 0.0532 0.0311 0.0130SC 0.3466 0.2764 0.2239 0.1799 0.1370 0.1055 0.0748 0.0523 0.0272

SCL4 0.3377 0.2705 0.2153 0.1712 0.1314 0.1000 0.0725 0.0515 0.0270SCL8 0.3377 0.2697 0.2145 0.1699 0.1306 0.0995 0.0719 0.0515 0.0270SCL16 0.3377 0.2695 0.2142 0.1693 0.1305 0.0995 0.0719 0.0515 0.0270

(b) Block length N = 128

DR

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

D = h−12 (1−R) 0.3160 0.2430 0.1893 0.1461 0.1100 0.0794 0.0532 0.0311 0.0130SC 0.3390 0.2698 0.2170 0.1745 0.1327 0.1012 0.0724 0.0474 0.0248

SCL4 0.3318 0.2626 0.2086 0.1666 0.1261 0.0964 0.0725 0.0449 0.0238SCL8 0.3312 0.2612 0.2074 0.1648 0.1246 0.0954 0.0719 0.0449 0.0238SCL16 0.3311 0.2608 0.2070 0.1639 0.1241 0.0950 0.0719 0.0448 0.0238

(c) Block length N = 256

DR

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

D = h−12 (1−R) 0.3160 0.2430 0.1893 0.1461 0.1100 0.0794 0.0532 0.0311 0.0130SC 0.3388 0.2668 0.2114 0.1682 0.1303 0.0980 0.0694 0.0449 0.0229

SCL4 0.3318 0.2576 0.2036 0.1604 0.1234 0.0921 0.0664 0.0427 0.0222SCL8 0.3307 0.2563 0.2022 0.1587 0.1219 0.0907 0.0659 0.0422 0.0222SCL16 0.3302 0.2555 0.2013 0.1576 0.1212 0.0902 0.0654 0.0421 0.0222

(d) Block length N = 512

Table 3.1: Rate-distortion pairs SC v.s. SCL encoding

27

Chapter 4

Polar Codes with SideInformation

In the previous chapters, we studied polar codes are optimal both for channelcoding and lossy source coding. It is natural to think if there is an appropriatescenario that combines quantization and error correction at the same time. Twosuch problems are Wyner-Ziv and Gelfand-Pinsker problem, which describe thesource coding with side information and channel coding with side informationrespectively. Here, we will only consider the Wyner-Ziv problem. The Wyner-Ziv problem was first developed in [15]. And in [16], polar codes of the Wyner-Ziv scenario was discussed, which showed that polar codes are also optimum.

We start by introducing the basic of Wyner-Ziv problem. Then we will showthe optimality of polar codes under the scenario of Wyner-Ziv. Then we discussthe non-binary version of this problem.

4.1 Wyner-Ziv Problem

Consider the following system as show in Figure 4.1.

Encoder DecoderU RX

+

YBer(p)

Figure 4.1: The Wyner-Ziv problem

In this scenario, assume X is the binary symmetric source. The encodercompresses the source X at the rate of R. The task of the decoder is to recon-struct X based on (U, Y ) within distortion D. Suppose Y relates to source Xin this way: Y = X +Ber(p). In this setup, Y is the side information.

The rate distortion for this problem is given by lower convex envelope (l.c.e.){(RWZ(D), D), (0, p)} [15], where RWZ(D) = h2(D ∗ p) − h2(D) and D ∗ p =

28

D(1− p) + p(1−D). However, instead of considering realizing the l.c.e., we willconcern achieving the rate distortion form RWZ(D) as in [16].

Now let us consider the Wyner-Ziv problem with polar codes. The encoderin Figure 4.1 would be a SC encoder.

Algorithm 4 Polar codes for Wyner-Ziv problem (Encoding)

Input: Pre-designed distortion D, source sequence x with length N , Bhat-tacharyya parameter ZN (D) for BSC(D) of length N , sufficiently small numberδ

Output: Compressed sequence u

1: u = zeros(N, 1)2: Rs = 1− h2(D)

3: Fs = {i : Z(i)N (D) ≥ 1− δ}

4: for i = 1 : N do5: if i ∈ Fs then6: ui = 07: else8: l = LLR

(i)N (xN1 , u

i−11 )

9: L = 11+el

10: r = randn(1, 1)11: if r ≤ 1− L then12: ui = 013: else14: ui = 1

15: return u

Algorithm 4 describes the encoding of polar codes for Wyner-Ziv problem.

Here for simplicity, let Z(i)N (D) denote Z(W

(i)N ) when W is a binary symmetric

channel with crossover probability D (BSC(D)). And let ZN (D) denotes theBhattacharyya parameter vector with length N when W is BSC(D).

29

Algorithm 5 Polar codes for Wyner-Ziv problem (Decoding)

Input: Pre-designed distortion D, received sequence y with length N ,Bhattacharyya parameter ZN (D ∗ p) for BSC(D ∗ p) of length N , sufficientlysmall number δ

Output: Reconstructed sequence x

1: u = zeros(N, 1)2: x = zeros(N, 1)

3: Fc = {i : Z(i)N (D ∗ p) ≥ δ}

4: for i = 1 : N do5: if i ∈ Fc\Fs then6: ui = ui7: else if i ∈ Fs then8: ui = 09: else

10: l = LLR(i)N (yN1 , u

i−11 |ui)

11: if l ≥ 0 then12: ui = 013: else14: ui = 1

15: x = uGN16: return x

Algorithm 5 describes the decoding of polar codes for Wyner-Ziv problem.Since we are using SC encoding at encoder, the average distortion between U =UGN and X is about D. Then we can obtain a Markov chain UN −XN − Y N .

UN

0

XN

0

Y N

0

1 1 1

1−D 1− p

D pD p

1−D 1− p

Figure 4.2: The Markov Chain

It was proved in [14, Section 4.1] that the Wyner-Ziv rate distortion functionRWZ(D) can be achieved with the code construction as given in the Algorithms4 and 5 , which gives

Fs = {i : Z(i)N (D) ≥ 1− δ} and Fc = {i : Z

(i)N (D ∗ p) ≥ δ} (4.1)

It is obvious that the BSC(D∗p) is degraded with respect to (wrt) the BSC(D).

Recall the property of degraded channels in Section 2.2.2, we have Z(i)N (D) ≤

Z(i)N (D ∗ p) for any i. Since δ is a sufficiently small number, we have δ ≤ 1− δ2.

Then we have δ ≤ 1 − δ2 ≤ Z(i)N (D) ≤ Z

(i)N (D ∗ p). This implies that for any

i ∈ Fs then i ∈ Fc. So in this setup, we separate the indices into three partFs, Fc\Fs and Fc. The relationship could be depicted as in Figure 4.3, where

30

the rectangular is the set of indices. Note that the indices could be distributedrandomly, i.e., they are not necessarily sorted linearly.

Fc

Fs Fc\Fs

Figure 4.3: The subset structure of the frozen sets Fs and Fc

According to Theorems 2.1 and 2.2, with fixed ε > 0, for sufficiently largeN , there is

|Fs|N≥ h2(D)− ε

2and

|Fs|N≤ h2(D ∗ p) +

ε

2.

And from Algorithm 4 and 5, the encoder only transmits UFc\Fs to decoder.The rate would be

Ru =|Fc| − |Fs|

N≤ h2(D ∗ p)− h2(p) + ε.

With this nested construction, polar codes are optimal for the Wyner-Ziv prob-lem [16]. For any rate R > h2(D ∗ p) − h2(p) and any β ∈ [0, 1

2 ), there exista sequence of nested polar codes of length N with rates RN < R, under SCrandomized rounding encoding and SC decoding, the expected distortion willsatisfy

DN = E [d(X, X)] ≤ D +O(2−Nβ

),

and the block error probability satisfy

PN ≤ O(2−Nβ

).

4.2 Two-layer Wyner-Ziv Coding

The idea of two-layer Wyner-Ziv problem is based on successive refinement,which was firstly proposed in [18]. In the setting of successive refinement, theencoder operates in two stages. In the first stage, the encoder encodes thesource at lower rate and higher distortion. The second stage aims at refining theoutput from first stage with acknowledgement of source and first stage output.Successive refinement for the Wyner-Ziv scenario was discussed in [19]. Theissue of polar codes for the two-layer Wyner-Ziv problem was studied in [20].It provides an approach to secure and efficient data compression. This scenariois based on the channel polarization with q-ary alphabets as discussed in [22].Here, we will focus on the situation of one input and two outputs. The twooutputs includes the outputs from the first layer and the second layer.

4.2.1 Two-layer Polar Coding

Consider the following Markov chain as depicted in Figure 4.4, where X is theinput, and (U, V ) are outputs. α and β are pre-defined distortion parameters.We encode X using polar codes for source coding and generate output U which

31

gives distortion α ∗ β. For simplicity, we denote α ∗ β with D. Then we find Vusing the two-layer polar codes construction, based on the input X, first layeroutput U and distortion parameters α and β.

U

0

V

0

X

0

1 1 1

1− α 1− β

α βα β

1− α 1− β

Figure 4.4: Markov chain U − V −X.

For the two-layer polar codes construction, we will need the conditionaldistribution pUX|V . Using the Bayes rule and Markov chain property, there is

pUX|V =pUXVpV

=pX|UV pUV

pV= pX|V pU |V ,

pUX|V (u = 0, x = 0|v = 0) = pX|V (x = 0|v = 0)pU |V (u = 0|v = 0) = (1− α)(1− β),

pUX|V (u = 0, x = 0|v = 1) = pX|V (x = 0|v = 1)pU |V (u = 0|v = 1) = αβ,

pUX|V (u = 0, x = 1|v = 0) = pX|V (x = 1|v = 0)pU |V (u = 0|v = 0) = (1− α)β,

pUX|V (u = 0, x = 1|v = 1) = pX|V (x = 1|v = 1)pU |V (u = 0|v = 1) = α(1− β),

pUX|V (u = 1, x = 0|v = 0) = pX|V (x = 0|v = 0)pU |V (u = 1|v = 0) = α(1− β),

pUX|V (u = 1, x = 0|v = 1) = pX|V (x = 0|v = 1)pU |V (u = 1|v = 1) = (1− α)β,

pUX|V (u = 1, x = 1|v = 0) = pX|V (x = 1|v = 0)pU |V (u = 1|v = 0) = αβ,

pUX|V (u = 1, x = 1|v = 1) = pX|V (x = 1|v = 1)pU |V (u = 1|v = 1) = (1− α)(1− β).

With these conditional probabilities, we can find the channel transition proba-bility matrix

WUX|V =[

(1−α)(1−β)αβ

(1−α)βα(1−β)

α(1−β)(1−α)β

αβ(1−α)(1−β)

]. (4.2)

Apply the recursion as in equation (2.18), there is

WN (uN1 , xN1 |vN1 ) = WN/2(vN1,e ⊕ vN1,o)WN/2(uNN/2+1, x

NN/2+1, |v

N1,e). (4.3)

Similar to equations (2.19) and (2.20), we have the following result for two-layerpolar codes. For any n ≥ 0, N = 2n, 1 ≤ i ≤ N , we will have

W(2i−1)2N (u2N

1 , x2N1 , v2i−2

1 |v2i−1)

=∑v2i

1

2W

(i)N (uN1 , x

N1 , v

2i−21,o ⊕ v

2i−21,e |v2i−1 ⊕ v2i)W

(i)N (u2N

N+1, x2NN+1, v

2i−21,e |v2i),

(4.4)

W(2i)2N (u2N

1 , x2N1 , v2i−1

1 |v2i)

=1

2W

(i)N (yN1 , u

2i−21,o ⊕ u

2i−21,e |u2i−1 ⊕ u2i)W

(i)N (y2N

N+1, u2i−21,e |u2i).

(4.5)

As we see, two-layer polar codes follow similar channel transformation.

32

4.2.2 Two-layer Wyner-Ziv Encoding

The code design for two-layer polar codes encoding is as follows. Encoding Ufollows the same procedure as we discussed in Section 3.2. For the code designof encoding V , the indices are partitioned into three parts, AV , FV and DV .We will determine second layer output {Vi, i ∈ AV } with acknowledge of outputthe first layer U and source X, and decide {Vi, i ∈ DV } with knowing U . FV isthe set of indices of which the channels are unreliable. It is shared between theencoder and decoder. For simplicity, we will set the frozen symbols to 0.

Before proceeding to further analysis, recall Theorem 2.1, which is concern-ing the rate of the Bhattacharyya parameter either when it approaches 0 or 1in the one-layer setup. A more general result was obtained in [21].

Theorem 4.1. For any β ∈ [0, 12 ), i.i.d. random variables (X,Y ) and Un =

XnGN , we have

limn→∞

1

n

{i : Z(Ui|U i−1

1 , Y n1 ) ≤ 2−nβ

, andZ(Ui|U i−11 ) ≥ 1− 2−n

β}∣∣∣ = I(X;Y )

(4.6)

limn→∞

1

n

{i : Z(Ui|U i−1

1 , Y n1 ) ≥ 1−2−nβ

, andZ(Ui|U i−11 ) ≤ 2−n

β}∣∣∣ = 1−I(X;Y )

(4.7)

This implies that for sufficiently large sequence length N and any RaV >I(X;V |U), we can find a set of indices AV , such that

|AV | = NRaV andAcV = {i : Z(Vi|V i−1, UN1 , XN1 ) ≥ 1−δ orZ(Vi|V i−1, UN1 ) ≤ δ}.

Hence

AV = {i : Z(Vi|V i−1, UN1 , XN1 ) < 1− δ orZ(Vi|V i−1, UN1 ) > δ}.

And we define FV and DV as

FV = {i ∈ AcV : Z(Vi|V i−1, UN1 , XN1 ) ≥ 1− δ},

DV = {i ∈ AcV : Z(Vi|V i−1, UN1 ) ≤ δ}

Algorithm 6 describes the encoding of the two-layer polar codes.

33

Algorithm 6 Two-layer Polar codes (Encoding)

Input: Pre-defined crossover probability α and β, original data x withlength N , Bhattacharyya parameter ZN (α ∗ β) and ZN (α) of length N , suffi-ciently small number δ

Output: Compressed sequence u and v

1: u = zeros(N, 1)

2: FU = {i : Z(i)N (α ∗ β) ≥ 1− δ}

3: for i = 1 : N do4: if i ∈ Fu then5: ui = 06: else7: l = LLR

(i)N (xN1 , u

i−11 )

8: if l ≥ 0 then9: ui = 0

10: else11: ui = 1

12: v = zeros(N, 1)

13: AV = {i : Z(i)N (α, β) ≤ 1− δ and Z

(i)N (α) ≥ δ}

14: FV = {i ∈ AcV : Z(i)N (α, β) ≥ 1− δ}

15: DV = {i ∈ AcV : Z(i)N (α) ≤ δ}

16: for i = 1 : N do17: if i ∈ AV then18: l = LLR

(i)N (uN1 , x

N1 , v

i−11 )

19: if l ≥ 0 then20: vi = 021: else22: vi = 1

23: else if i ∈ DV then24: l = LLR

(i)N (uN1 , v

i−11 )


29: return v, u

4.2.3 Two-layer Wyner-Ziv Decoding

For the decoding of two-layer polar codes, suppose the channel is BSC(p), thenwe can formulate the following Markov chain as depicted in Figure 4.5.

Firstly, we decode U with acknowledge of Y and crossover probability α ∗β,which is exactly the same with the one-layer Wyner-Ziv problem as discussedin Section 4.1. Follow the analysis in Section 4.1, we partition the indices intoBu, Au\Bu and Acu, where Au and Bu are constructed as

AU = {i : Z(i)N (α ∗ β) ≤ 1− δ} and BU = {i : Z

(i)N (D ∗ p) ≤ δ} (4.8)

When decoding U , only the bits with indices belonging to Au\Bu are trans-

mitted. The rate for decoding U is Ru = |AU\BU |N = I(X;U) − I(Y ;U) =

h2(α ∗ β ∗ pe)− h2(α ∗ β).

34

U

0

V

0

X

0

Y

0

1 1 1 1

1− α 1− β

α βα β

1− α 1− β

1− p

pp

1− p

Figure 4.5: Markov chain U − V −X − Y

Secondly, we decode V with acknowledge of X and U . For the code designof decoding V , the indices are partitioned into four parts, BV , AV \BV , FV andDV . The details of the code construction are given as follows.

Follow the similar reasoning of finding AV as in Section 4.2.2, for sufficientlylarge sequence length N and any RbV < I(Y ;V |U), we can find a set of indicesBV , such that

|BV | = NRbV andBV = {i : Z(Vi|V i−1, UN1 , YN1 ) ≤ δ andZ(Vi|V i−1, UN1 ) ≥ 1−δ}.

FV and DV are the same as that defined in encoding.

shared

transm

itted

vN

vn

p(vi|uN1 , vi−11 ) p(vi|uN1 , xN1 , vi−1

1 ) BV FV

p(vi|uN1 , vi−11 ) p(vi|uN1 , yN1 , vi−1

1 ) FV

DV AV

Figure 4.6: Code construction of the two-layer Polar Codes when decoding v

The code construction of the two-layer polar codes for Wyner-Ziv problemis illustrated in Figure 4.6. As we see from Figure 4.6 , only {vi, i ∈ AV \BV } istransmitted between the encoder and the decoder. So the rate for decoding vwill be

Rv =|AV \BV |

N= I(X;V |U)− I(V ;Y |U),

and we have

I(X;V |U) = H(X|U) +H(V |U)−H(X;V |U),

H(X|U) = h2(α ∗ β),

H(V |U) = h2(α),

H(X;V |U) = H(X;V |U = 0) = H(X;V |U = 1).

35

With

p(x = 0, v = 0|u = 0) = (1− α)(1− β),

p(x = 0, v = 1|u = 0) = α(1− β),

p(x = 1, v = 0|u = 0) = (1− α)β,

p(x = 1, v = 1|u = 0) = αβ,

there is

H(X;V |U = 0) = −(1− α)(1− β)(log(1− α) + log(1− β))

− α(1− β)(logα+ log(1− β))

− (1− α)β(log(1− α) + log β)

− αβ(logα+ log β)

= h2(α) + h2(β).

ThusI(X;V |U) = h2(α ∗ β)− h2(β).

Similarly, we have

I(Y ;V |U) = h2(α ∗ β ∗ pe)− h2(β ∗ pe).

ThusRv = h2(α ∗ β)− h2(β)− h2(α ∗ β ∗ pe) + h2(β ∗ pe).

And decoding u is one-layer polar codes for Wyner-Ziv problem, so the ratefor decoding u is

Ru = h2(α ∗ β ∗ pe)− h2(α ∗ β).

So the combined rate for this two-layer scheme is

Ru +Rv = h2(β ∗ pe)− h2(β). (4.9)

From equation (4.9), we know the best case we can expect for is the two rates,Ru and Ru +Rv lie on the rate-distortion bound simultaneously.

Algorithm 7 describes how the decoding works for two-layer polar codes.

For simplicity, let Z(i)N (α, β) denote Z(W

(i)N ) when W is two-layer as depicted

in Figure 4.4.

36

Algorithm 7 Two-layer Polar Codes (Decoding)

Input: Pre-defined crossover probabilities α and β, source vector x withlength N , compressed sequence u and v, BSC crossover probability p, Bhat-tacharyya parameter ZN (α, β), ZN (D ∗ p) and ZN (α) of length N , sufficientlysmall number δ

Output: Reconstructed sequence u and v

1: u = zeros(N, 1)

2: AU = {i : Z(i)N (α ∗ β) ≤ 1− δ}

3: BU = {i : Z(i)N (D ∗ p) ≤ δ}

4: for i = 1 : N do5: if i ∈ Au\Bu then6: ui = ui7: else if i ∈ Bu then8: l = LLR

(i)N (yN1 , u

i−11 |ui)

9: if l ≥ 0 then10: ui = 011: else12: ui = 1

13: u = uGN14: v = zeros(N, 1)

15: AV = {i : Z(i)N (α, β) ≤ 1− δ and Z

(i)N (α) ≥ δ}

16: BV = {i : Z(i)N (α, β ∗ p) ≤ δ and Z

(i)N (α ∗ β) ≥ 1− δ}

17: FV = {i ∈ AcV : Z(i)N (α, β) ≥ 1− δ}

18: DV = {i ∈ AcV : Z(i)N (α) ≤ δ}

19: for i = 1 : N do20: if i ∈ AV \BV then21: vi = vi22: else if i ∈ Bu then23: l = LLR

(i)N (uN1 , y

N1 , v

i−11 )


28: else if i ∈ DV then29: l = LLR

(i)N (uN1 , v

i−11 )


34: v = vGN35: return u and v

Notice that we need the two-layer Bhattacharyya parameter, which can befound with Monte-Carlo estimation. Algorithm 8 describes how to estimate theBhattacharyya parameter ZN (α, β) with pre-defined crossover probabilities αand β.

37

Algorithm 8 The Monte-Carlo estimation (two-layer)

Input: Sequence length N , pre-defined crossover probabilities α and β,Monte-Carlo iterations Runs

Output: The estimated Z(W iN ), with i = 1, 2, · · · , N

1: Z = zeros(N, 1)2: for r = 1 : Runs do3: v = randn(N, 1)4: u = bsc(x, α)5: x = bsc(x, β)6: for i = 1 : N do7: l = LLR

(i)N (uN1 , x

N1 , v

i−11 |vi)

8: if l ≥ 0 then

9: ZiN (α, β) =ZiN (α,β)∗(r−1)+e−

12l

r10: else

11: ZiN (α, β) =ZiN (α,β)∗(r−1)+e

12l

r

12: return Z

4.3 Simulation Results

4.3.1 One-layer Polar Codes for Wyner-Ziv Problem

In this section, we give and discuss the simulation results of one-layer polar codesfor Wyner-Ziv problem. Here, we assumed the source is binary symmetric. Thechannel corresponding to the side information is the BSC(0.3). We will onlyconsider realizing the Wyner-Ziv rate distortion RWZ(D) instead of the lowerconvex envelope. Given the source is binary symmetric and side informationis correlated to source via BSC(0.3), the Wyner-Ziv rate distortion is functionRWZ(D) = h2(D ∗ 0.3)− h2(D).

Figure 4.7 shows the results for the Wyner-Ziv problem. The dashed linerepresent the Wyner-Ziv rate distortion RWZ(D). The solid line is the lowerconvex envelope of RWZ(D) and (0,0.3). The performance dots are for blocklengths 128, 256, 512 and 1024. As seen from Figure 4.7, as the block lengthsincrease, the rate distortion curve approaches the Wyner-Ziv function RWZ(D).The observation is also seen in [16].

4.3.2 Two-layer Polar Codes for Wyner-Ziv Problem

In this section, we discuss the simulation results for two -layer polar codes forWyner-Ziv problem. Similarly, we assume the source is binary symmetric andthe channel corresponding to the side information is the BSC(0.3). And inthe code construction, we assume the first layer distortion is α ∗ β and thesecond layer distortion is β. The combined rate for this case is Ru + Rv =h2(β ∗ 0.3) − h2(β). Therefor in this case, the rate distortion function is thesame as the one-layer case.

Figure 4.8 illustrates the rate-distortion performance of two-layer polar codesfor the Wyner-Ziv problem. Figure 4.8a shows the simulation results when thefirst layer distortion is α∗β = 0.2 and the second layer distortion β = 0.05, 0.10.An d in Figure 4.8b, we assume the the first layer distortion is α∗β = 0.3 and the

38

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

D

Polar Codes for Wyner-Ziv probelm (1-layer)

WZ function

l.c.e. of WZ function

Blocklength128

Blocklength256

Blocklength512

Blocklength1024

Figure 4.7: The rate-distortion performance of polar codes for Wyner-Ziv prob-lem.

second layer distortion β = 0.05, 0.10. Similar to the one-layer case, the dashedline is the rate distortion function RWZ(D) = h2(D ∗ 0.3) − h2(D). The solidline is the lower convex envelope of RWZ(D) and (0,0.3). We are also interestedin achieving the RWZ(D). The rate-distortion pairs dots are for block lengths128, 256, 512 and 1024. As seen from Figure 4.8, as the block lengths increase,the rate distortion curve approaches the Wyner-Ziv function RWZ(D) slowercompared to Figure 4.7. From equation (4.9), the expected combined rate alsolies on the rate-distortion bound. However, it is not always possible that thetwo rates simultaneously lie on the rate-distortion curve, which was discussedin [19]. Some penalty have to be paid in the process of successive refinement.

Another observation is that, if we set the distortion α ∗ β larger, the ratedistortion performance becomes better. As shown in Figure 4.8a and 4.8b, therate distortion pair dots of the case α ∗ β = 0.30 approach RWZ(D) faster thanthat of α ∗ β = 0.20. The possible reason is that given different distortionparameters, the channels polarization and the Bhattacharyya parameter is dif-ferent. The codes U and V might better describe the source when the first layerdistortion α ∗ β is larger.

39

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

D

Polar Codes for Wyner-Ziv problem (2-layer)

RWZ

(D)

l.c.e. of RWZ

(D)

Blocklength128

Blocklength256

Blocklength512

Blocklength1024

(a) α ∗ β = 0.20, β = 0.05, 0.10

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

D

Polar Codes for Wyner-Ziv problem (2-layer)

RWZ

(D)

l.c.e. of RWZ

(D)

Blocklength128

Blocklength256

Blocklength512

Blocklength1024

(b) α ∗ β = 0.30, β = 0.05, 0.10

Figure 4.8: The rate-distortion performance of two-layer polar codes for Wyner-Ziv problem.

40

Chapter 5

Identification System

In this chapter, we will consider polar codes for identification systems. Westart with a discussion of the model for an identification system (Section 5.1).We then introduce the application of polar codes in an identification system inSection 5.2. In Section 5.3, we show some simulation results.

5.1 Model of Identification System

In [23], the capacity of an identification system was discussed. This section isbased on the work in [23]. Suppose there are M individuals in an identificationsystem. For each individual with an index w ∈ {1, 2, · · · ,M}, there is a cor-responding biometrical data sequence xN = (x1, x2, · · · , xN ) with xi ∈ X fori = 1, · · · , N . The assumption for biometric data sequence is that xN is an i.i.d.source generated by distribution {PX(x) : x ∈ X}. Therefore the distributionof XN (w) is

Pr{XN (w) = xN} =

N∏i=1

PX(xi), for all xN ∈ XN , w ∈ {1, 2, · · · ,M}

xN (1) PU |X(u|x) uN (1)

xN (2) PU |X(u|x) uN (2)

...

xN (M) PU |X(u|x) uN (M)

Figure 5.1: Enrollment phase of an identification system

41

In the enrollment phase, a noisy version of the original data is enrolled inthe database. Assume the enrollment mapping is equivalent to a memorylesschannel {U , Pu|x(u|x),X}, where U is the output alphabet. So for any user w ∈{1, 2, ...,M} with biometrical data sequence xN (w) = {x1(w), x2(w), · · · , xN (w)},the distribution for the enrolled data UN (w) is

Pr{UN (w) = uN |XN (w) = xN} =

N∏i=1

PU |X(ui|xi(w)),

for all uN = {u1, u2, · · · , uN} ∈ UN .

The output sequences uN (w) for all w ∈ {1, 2, · · · ,M} are stored in thedatabase, which is accessible for the decoder in the identification phase. Figure5.1 shows the process of enrollment mapping.

In the identification phase, there are mainly two parts, observing an un-known user and decoding the observed data. Firstly, an unknown user w ∈{1, 2, · · · ,M} is observed through a memoryless channel {Y, Py|x(y|x),X}, whereY is the identification output alphabet. We have the probability distribution

Pr{Y N (w) = yN |XN (w) = xN} =

N∏i=1

PY |X(yi|xi(w)),

for all yN = {z1, z2, · · · , yN} ∈ YN .

Based on the output sequence and the established enrollment database, thedecoder gives an estimate of the index of the unknown individual, i.e.,

w = d(yN , uN (1), uN (2), · · · , uN (M)).

xN (w) PY |X(y|x) decoder wyN

Figure 5.2: Identification phase in an identification system

Figure 5.2 illustrates the process of identification phase. Note that in Figure5.2, the decoder has access to the stored data.

5.2 Polar Codes for Identification Systems

In this section, we discuss the implementation of polar codes for identificationsystems.

5.2.1 Basic Identification Systems

Combining the enrollment phase and the identification phase as discussed insection 5.1, we have the model for a complete identification system. Here, Nis the sequence length of source, R is the rate in the enrollment, and the PUrefers to a processing unit.

42

Users’ Data Database

PU

XN (w)Y NpY |X

XN (1)

XN (2)

XN (M)

...

UN (1)

UN (2)

UN (M)

...

Enrollment

(w, XN )

Figure 5.3: A basic identification system

The main purpose of enrollment mapping is to compress the source biometricdata in an efficient way. Here, we will apply the method of polar codes for lossysource coding discussed in Chapter 3 as the enrollment mapping.

For the identification phase, we are going to use maximum mutual informa-tion (MMI) decoder [25]. Worth noting is that, there is room for the improve-ment over the MMI decoder [26]. There are more reliable decoding schemes ifthe lossy source compression is involved, such as universal decoding. The in-tuition of choosing MMI decoding here is MMI decoding is a classic decodingscheme and requires less calculation. The idea of identification is to calculatethe mutual information between the observed data sequence yN and each datasequence uN stored in the database. The estimated user w is the one whichmaximizes the mutual information I(yN ;uN ), which can be expressed with

w = argmaxi∈{1,2,··· ,M}

I(yN ;uN (i))

5.2.2 Wyner-Ziv Scenario Based Identification Systems

Observe the model of an identification system as depicted in Figure 5.3, it issimilar to the Wyner-Ziv problem as shown in Figure 4.1. Here, the observation,which is correlated to the source sequence, can be seen as the side information.In the identification phase, the processing unit has access to both the compresseddata and the observation. This process can be seen as Wyner-Ziv problem. Itis natural to consider applying the polar codes for Wyner-Ziv problem in thedecoding process. In this scenario, we can implement the compression schemeas discussed in Section 4.1 in the enrollment phase. In the identification phase,we will also use the maximum mutual information (MMI) decoding. For each

43

enrolled user, do the reconstruction and calculate the mutual information be-tween the reconstruction and observation. The estimated observer is the onewhich maximize the mutual information. This process can expressed with

w = argmaxi∈{1,2,··· ,M}I(yN ; xN (i)).

5.2.3 Two-layer Identification Systems

In this section, we study the two-layer identification system. The two-layerscheme is motivated by the fact that, in practical applications, there might bea large number of user involved. The storage might not be sufficiently largeenough to enroll so many individuals. Additionally, as the number of collecteddata increases, data processing may slow down or crash. In order to avoidthis problem, additional to compress the data efficiently, storing the data inseparate databases could be a choice. Such an identification system can bemodeled in Figure 5.4. Similar to the identification systems described in Section5.2.2, the decoding processing unit have access to both the compress data andnoisy observation of source sequence, where the observation can be seen as theside information. Therefor in the enrollment and identification, we are goingto use the two-layer polar codes for Wyner-Ziv problem for compression andreconstruction. Here, N is the sequence length of source, Ru is the rate of thefirst layer compression, Rv is the rate in the second layer, and PU1, PU2 refer totwo processing units. In the enrollment phase, we firstly generate the first-layeroutput UN . Then we find second layer output V N based on the source XN

and the first-layer output UN . In the identification, the first processing unitPU1 do the first layer reconstruction with acknowledgement of the observationand the first database UN . Then the second precessing unit PU2 do the secondlayer reconstruction with knowing the first layer reconstruction and both of thedatabases.

As seen from Figure 5.4, there are two storage units. The first layer stor-age {UN (w), w ∈ {1, 2, · · · ,M}} can be seen as information describing themain feature of the users’ biometric data XN . While the second layer stor-age {V N (w), w ∈ {1, 2, · · · ,M}} stores the refinement information. A naturalproblem is as the number of users increases, the calculation of decoding becomescostly both for space and time. In order to solve this problem, the clusteringmethod is considered in [28], [29] and [30]. Here, we also adopted the ideaof clustering. At the first processing unit, PU1 outputs a list L. List L is aset of indices of users, which have the higher mutual information between thefirst-layer reconstructed sequence and observation yN . In a specific identifica-tion system, the list size is fixed. In the second phase, PU2 reconstructs therefined version of the sequences of the users belonging to the indices in the listL. Then the final output is the index which maximizes the mutual informationbetween the refined reconstructed sequence and the observation yN , which canbe formulated as

w = argmaxi∈{1,2,··· ,M}I(yN ; xN (i)).

5.2.4 Two-layer Identification System with Pre-processing

In practical scenarios, the size of the biometric data, such as face IDs can containa large amount of information. It might be difficult to deal with the raw observed

44


PU1 PU2

XN (w)Y NpY |X

XN (1)

XN (2)

XN (M)

...

UN (1)

UN (2)

UN (M)

...

V N (1)

V N (2)

V N (M)

...

Enrollment

L(w, XN )

UN (L) V N (L)

Figure 5.4: An example of two-layer identification system

data. A possible solution to this problem is to add a pre-processing unit beforethe decoders. The main mission of a pre-processing unit is to quantize theraw data and extract the main feature. Such an identification system can bemodeled as in Figure 5.5.

For simplicity, we will consider the pre-processing unit as some easy chan-nels, such as BSCs. Except for the pre-processing unit, the enrollment andidentification tasks are the same as the case discussed in Section 5.2.3.

5.3 Simulation Results and Discussion

In this section, we give and discuss the simulation results of polar codes foridentification systems.

5.3.1 One-layer Polar Codes for Identification Systems

Firstly, we discuss the simulation results of one-layer identification systems.Here we assumed the source sequences are binary symmetric and compressionrate is R = 0.5. And we assumed the observation channels are binary symmetricchannels with crossover probability from 0.1 to 0.4 with step size 0.05.

Figure 5.5 shows the performance of the identification system based on polarcodes for lossy source coding with the different number of users: (a) 500, (b)1000, (c) 1500 and (d) 2000. As we see from Figure 5.5, with the block lengthsincrease, the identification error rates go down and sometimes even become 0.

Another observation is, for different BSCs, the lower the crossover probabil-ity is, the smaller the error rate is. This result is intuitive.

45


PU1 PU2

Pre.processing

pZ|Y

XN (w)Y N

ZN

pY |X

XN (1)

XN (2)

XN (M)

...

UN (1)

UN (2)

UN (M)

...

V N (1)

V N (2)

V N (M)

...

Enrollment

L(w, XN )

UN (L) V N (L)

Figure 5.5: An example of two-layer identification system with pre-processing.

Figure 5.6 compares the performance of the identification system betweenthe SC encoder and SCL encoder with list size 4 (Note that the error rates go to0 when the block length is 1024 in both cases). As seen from the two curves, theerror rates reduced when the list based successive cancellation encoder is imple-mented. This result is intuitive, since the rate-distortion pairs of SCL encoderapproaches the rate-distortion bound faster. If we implement SCL encoder inenrollment phase, the compressed sequence better describe the source.

5.3.2 Two-layer Polar Codes for Identification Systems

Figure 5.7 shows the identification error rates with two-layer Wyner-Ziv cod-ing for polar codes applied. Here we assumed there are 100 users and let thedistortions be α ∗ β = 0.30 and β = 0.10. In the second stage of decoding, letthe output list includes 10% of indices the users. For the identification systemwith pre-processing, we assumed the quantization channel is BSC(0.05). FromFigure 5.7, we can see the error rate curve is convex, which is different from theconcave curves in Figure 5.5. The possible reason is that in the second stagewe are reconstructing based on a partition of the database and the error rateanalysis does not necessarily follows that of one-layer identification.

5.4 Complexity Analysis

Combine the result in section 2.7, we have the following complexity analysis foridentification systems.

46

0 200 400 600 800 1000 1200

Block Length L=64,128,256,512,1024

10-3

10-2

10-1

100

err

or

rate

Polar Codes for an identification system with M=500 R=0.5

BSC(0.1)

BSC(0.15)

BSC(0.2)

BSC(0.25)

BSC(0.3)

BSC(0.35)

BSC(0.4)

(a) M=500 users

0 200 400 600 800 1000 1200

Block Length L=64,128,256,512,1024

10-3

10-2

10-1

100

err

or

rate


BSC(0.1)

BSC(0.15)

BSC(0.2)

BSC(0.25)

BSC(0.3)

BSC(0.35)

BSC(0.4)

(b) M=1000 users

47

0 200 400 600 800 1000 1200

Block Length L=64,128,256,512,1024

10-3

10-2

10-1

100

err

or

rate


BSC(0.1)

BSC(0.15)

BSC(0.2)

BSC(0.25)

BSC(0.3)

BSC(0.35)

BSC(0.4)

(c) M=1500 users

0 200 400 600 800 1000 1200

Block Length L=64,128,256,512,1024

10-4

10-3

10-2

10-1

100

err

or

rate


BSC(0.1)

BSC(0.15)

BSC(0.2)

BSC(0.25)

BSC(0.3)

BSC(0.35)

BSC(0.4)

(d) M=2000 users

Figure 5.5: Polar Codes for an identification system (one-layer)

48

50 100 150 200 250 300 350 400 450 500 550

Block Length L=64,128,256,512,1024

10-2

10-1

100

err

or

rate

Polar Codes for an identification system with BSC(0.35) M=500 R=0.5

SC

SCL4

Figure 5.6: Polar Codes for an identification system (one-layer, SC v.s. SCL)

First consider the basic identification systems as depicted in 5.3. Supposethere are M users in the database, the block length is N , and the compres-sion rate is R, then the complexity for identifying an observation is O(M(1 −R)N logN).

As for two-layer identification systems as shown in Figures 5.4 and 5.5, sup-pose the compression rate for the first and second layer databases are Ru and Rvrespectively and the output L at PU1 includes indices of ML users, then com-plexity for identifying an observation is O((M(1−RU )+ML(1−RV ))N logN).

For a practical identification system, there might be a large number of in-dividuals, which means M could be a large number. Then both one-layer andtwo-layer schemes would cost a relative long time. In order to solve this prob-lem, some more advanced successive cancellation methods might be adopted,such as simplified successive cancellation (SSC) and fast simplified successivecancellation (FSSC).

49

0 200 400 600 800 1000 1200

Block Length L=64,128,256,512,1024

0.2

0.3

0.4

0.5

0.6

0.7

0.8

err

or

rate

Polar Codes for an identification system (Two-layer without pre-processing)

(a) Without pre-processing

0 200 400 600 800 1000 1200

Block Length L=64,128,256,512,1024

0.3

0.4

0.5

0.6

0.7

0.8

0.9

err

or

rate

Polar Codes for an identification system (Two-layer with pre-processing)

(b) With pre-processing

Figure 5.7: Polar Codes for an Identification System (Two-layer)

50

Chapter 6

Conclusion and FutureWork

6.1 Conclusion

In this thesis, we have discussed polar codes, both channel coding and sourcecoding. In source coding, the encoding operation is the same as the decod-ing operation in channel coding. We also studied polar codes are optimal forsource coding. Therefor polar codes can be used for efficient data compression.We also showed the list based successive cancellation encoder in source codingcan improve the compression performance, in the sense that it approaches therate-distortion bound faster. Then we discussed polar codes for the Wyner-Zivproblem. In this scenario, a lower rate is achieved compared with the normalsource coding using polar codes. The compensation is that we have to recoversome of the bits by doing successive cancellation. Next, we discussed the two-layer polar codes for the Wyner-Ziv problem. This compression scheme allowsus to generates two layer outputs. According to the analysis, the one-layerand two-layer compression schemes using polar codes for the Wyner-Ziv prob-lem should approach the same rate distortion bound if we set the one-layercompression rate and the combined compression rate for two-layer scheme thesame. However, in the simulations, the two-layer scheme approaches the ratedistortion slower than that of the one-layer scheme. This is because there is apenalty made in the successive refinement.

After introducing polar codes for compression under these different scenarios,we considered applying these compression schemes to identification systems.And we did the simulations and analyzed the results. The conclusion is thatit is feasible to use polar codes in identification systems. If there are a smallnumber of users and one storage is enough, the one-layer scheme shows betterperformance. However, if there are a large number of users involved and onestorage is not enough, two-layer scheme is preferred.

51

6.2 Future Work

Consider in practical issues, an identification involves a large number of users,the reconstruction method we applied can take a very long time. In this thesis,we assumed there are at most 2000 users, which is not quite enough. Moreover,more users mean more storage and computation. In order to make it workin practice, more efficient successive cancellation methods could be considered,such as SSC and FSSC. These two improved successive cancellation methodscost less time than the normal SC method but achieve the same performance.

Apart from more efficient code design, another thing that is interesting tolook at ARE more advanced identification mappings. We are using the ba-sic maximum mutual information decoder to identify, which is not optimal iflossy source coding is implemented. There is still room for improvement overMMI decoder. The more advanced decoder such as universal decoder could beimplemented.

52

Bibliography

[1] Wayman, James Jain, Anil Maltoni, Davide Maio, Dario. (2005). An Intro-duction to Biometric Authentication Systems. 1-20.

[2] Fahad Al-harby, Rami Qahwaji, Mumtaz Kamala, Secure Biometrics Au-thentication: A brief review of the Literature.

[3] Prabhakar, S., S. Pankanti, and A.K. Jain, Biometrics Recognition: Securityand Privacy Concerns. IEEE Security Privacy, 2003.1(2):p. 33-42.

[4] Global Biometrics Market - Competition Forecast Opportunities2012 - 2022, [Online]. Available: https://www.prnewswire.com/news-releases/global-biometrics-market—competition-forecast–opportunities-2012–2022-300536101.html [Accessed 2018]

[5] National Research Council. 2010. Biometric Recognition: Challenges andOpportunities. Washington, DC: The National Academies Press.

[6] Erdal Arikan. ”Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels.” Informa-tion Theory, IEEE Transactions on, 55(7):30513073, 2009.

[7] I. Tal and A. Vardy, ”List Decoding of Polar Codes,” in IEEE Transactionson Information Theory, vol. 61, no. 5, pp. 2213-2226, May 2015.

[8] K. Chen, K. Niu and J. R. Lin, ”List successive cancellation decoding ofpolar codes,” in Electronics Letters, vol. 48, no. 9, pp. 500-501, April 262012.

[9] K. Chen, K. Niu and J. Lin, ”Improved Successive Cancellation Decodingof Polar Codes,” in IEEE Transactions on Communications, vol. 61, no. 8,pp. 3100-3107, August 2013.

[10] S. B. Korada and R. Urbanke, ”Polar codes are optimal for lossy sourcecoding,” 2009 IEEE Information Theory Workshop, Taormina, 2009, pp.149-153.

[11] RAEES KIZHAKKUMKARA MUHAMAD, ”Polar Codesfor secure binary Wyner-Ziv source coding.” Mas-ter thesis, Stockholm, Sweden, 2017[Online]. Available:http://www.divaportal.se/smash/get/diva2:1146199/FULLTEXT01.pdf

[12] C.E. Shannon, ”A Mathematical Theory of Communication,” The Bell Sys-tem Technical Journal, Vol. 27, pp. 379423, 623656, July, October, 1948.

53

[13] Arkan and E. Telatar, On the rate of channel polarization, in Proc. of theIEEE Int. Symposium on Inform. Theory, Seoul, South Korea, July 2009,pp. 14931495.

[14] S. B. Korada, Polar codes for channel and source coding, Ph.D.dissertation, Lausanne, Switzerland, 2009 [Online]. Available:http://library.epfl.ch/theses/?nr=4461

[15] A. Wyner and J. Ziv, ”The rate-distortion function for source coding withside information at the decoder,” in IEEE Transactions on Information The-ory, vol. 22, no. 1, pp. 1-10, Jan 1976.

[16] S. B. Korada and R. Urbanke, ”Polar codes for Slepian-Wolf, Wyner-Ziv,and Gelfand-Pinsker,” 2010 IEEE Information Theory Workshop on Infor-mation Theory (ITW 2010, Cairo), Cairo, 2010, pp. 1-5.

[17] E. Arkan and E. Telatar, On the rate of channel polarization, in Proc.of the IEEE Int. Symposium on Inform. Theory, Seoul, South Korea, July2009, pp. 14931495.

[18] V. N. Koshelev, Hierarchical coding of discrete sources, Probl. Pered.Inform., vol. 16, no. 3, pp. 3149, 1980. English translation: Probl. In-form.Transm., vol. 16, pp. 186-203, 1980.

[19] Y. Steinberg and N. Merhav, ”On successive refinement for the Wyner-Zivproblem,” in IEEE Transactions on Information Theory, vol. 50, no. 8, pp.1636-1654, Aug. 2004.

[20] M. T. Vu, T. J. Oechtering and M. Skoglund, ”Polar code for secure Wyner-Ziv coding,” 2016 IEEE International Workshop on Information Forensicsand Security (WIFS), Abu Dhabi, 2016, pp. 1-6.

[21] J. Honda and H. Yamamoto, ”Polar Coding Without Alphabet Extensionfor Asymmetric Models,” in IEEE Transactions on Information Theory, vol.59, no. 12, pp. 7829-7838, Dec. 2013.

[22] R. Mori and T. Tanaka, ”Channel polarization on q-ary discrete memory-less channels by arbitrary kernels,” 2010 IEEE International Symposium onInformation Theory, Austin, TX, 2010, pp. 894-898.

[23] F. Willems, T. Kalker, J. Goseling and J. P. Linnartz, ”On the capacityof a biometrical identification system,” IEEE International Symposium onInformation Theory, 2003. Proceedings., 2003, pp. 82-.

[24] F. Farhadzadeh, F. M. J. Willems and S. Voloshynovskiy, ”Fundamentallimits of identification: Identification rate, search and memory complexitytrade-off,” 2013 IEEE International Symposium on Information Theory, Is-tanbul, 2013, pp. 1252-1256.

[25] I.Csiszr and J.Krner, Information Theory: Coding Theorems for DiscreteMemoryless Systems, Cambridge University Press, 2011

[26] N. Merhav, ”Reliability of Universal Decoding Based on Vector-QuantizedCodewords,” in IEEE Transactions on Information Theory, vol. 63, no. 5,pp. 2696-2709, May 2017.

54

[27] C. Leroux, I. Tal, A. Vardy and W. J. Gross, ”Hardware architectures forsuccessive cancellation decoding of polar codes,” 2011 IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP), Prague,2011, pp. 1665-1668

[28] F. M. J. Willems, ”Searching methods for biometric identification systems:Fundamental limits,” 2009 IEEE International Symposium on InformationTheory, Seoul, 2009, pp. 2241-2245.

[29] E. Tuncel, ”Capacity/Storage Tradeoff in High-Dimensional IdentificationSystems,” in IEEE Transactions on Information Theory, vol. 55, no. 5, pp.2097-2106, May 2009.

[30] F. Farhadzadeh, F. M. J. Willems and S. Voloshynovskiy, ”Fundamentallimits of identification: Identification rate, search and memory complexitytrade-off,” 2013 IEEE International Symposium on Information Theory, Is-tanbul, 2013, pp. 1252-1256.

55

Polar Codes for Identification Systems1232884/... · 2018-07-13 · Registreringsfasen dr data om...

Documents

Transcript of Polar Codes for Identification Systems1232884/... · 2018-07-13 · Registreringsfasen dr data om...