The Role of Feedback in Communicationsece.drexel.edu/walsh/Solmaz_FeedbackComm.pdf · noiseless...

The Role of Feedback in Communications

Solmaz Torabi

Dept. of Electrical and Computer EngineeringDrexel [email protected]

Advisor: Dr. John M. Walsh

April 22, 2015

1/75

hey

1

References I

M. Horstein, “Sequential transmission using noiseless feedback,”Information Theory, IEEE Transactions on, vol. 9, no. 3, pp.136–143, 1963.

J. Schalkwijk and T. Kailath, “A coding scheme for additive noisechannels with feedback–i: No bandwidth constraint,” InformationTheory, IEEE Transactions on, vol. 12, no. 2, pp. 172–182, 1966.

O. Shayevitz and M. Feder, “Optimal feedback communication viaposterior matching,” Information Theory, IEEE Transactions on,vol. 57, no. 3, pp. 1186–1222, 2011.

N. Gaarder and J. K. Wolf, “The capacity region of a multiple-accessdiscrete memoryless channel can increase with feedback (corresp.),”Information Theory, IEEE Transactions on, vol. 21, no. 1, pp.100–102, 1975.

G. Kramer, “Directed information for channels with feedback,”Ph.D. dissertation, University of Manitoba, Canada, 1998.

2/75

hey

2

References II

A. El Gamal and Y.-H. Kim, Network information theory.Cambridge University Press, 2011.

R. Venkataramanan and S. S. Pradhan, “Source coding withfeed-forward: Rate-distortion theorems and error exponents for ageneral source,” Information Theory, IEEE Transactions on, vol. 53,no. 6, pp. 2154–2179, 2007.

3/75

hey

3

Outline

I Introduction

I Point to point communicationI Horstein coding Scheme

I Block Feedback Coding Scheme for BSC

I Schalkwijk-Kailath Coding scheme

I Posterior matching scheme

I Multiuser ChannelI Multiple access channel

I Two way channel

I Source coding with feedforward

4/75

hey

4

DMC with feedback

Encoder p(y|x) DecoderXi Yi

Y i�1

M M̂

I Memoryless channel: PYn|X n,Y n−1 (.|xn, yn−1) = P(.|xn)

I Messages: M

I Encoding functiongi :M×Y i−1 → X

I Xn is a function of (M,Y1,Y2, ...,Yn−1)

5/75

hey

5



Y i�1

M M̂

I If the channel is memory less, there is no information you get fromthe feedback that can help you increase your rate

CFB = maxp(x)

I (X ;Y ) = C

7/75

hey

7

Capacity of Memoryless Feedback Channel

I Shannon 56: Feedback does not increase capacity of memorylesschannel

I Simplifiying schemes for attaining it:

I Horstein 63:Developed a recursive coding strategy for the BSC withnoiseless feedback(sequential coding acheme, varying block length)

I Schalkwijk-Kailath 66: AWGN channel

I Shayevtitz 2008: Extends the SK and Horstein schemes to generalmemoryless channels

I Gaarder-Wolf 75: Feedback enlarge the capacity region of multiuserchannels

8/75

hey

8


Feedback can:

I Simplify coding scheme

I Improve reliability (decreases error prob. much faster)

I Increase capacity channels with memory

I Enlarge capacity region of multiuser channels (Gaarder-Wolf 1975)

9/75

hey

9

Iterative refinement for BEC

I First send a message at a rate higher than the channel capacity(without coding)

I Then iteratively refine the receiver’s knowledge about the message

0 0

1 1

?

p

p

1 � p

1 � p

n + pn + p2n + ... =n

1 � p

su�ces to transmit n bits reliably

I We can achieve the capacity C = 1− p by simply retransmittingeach bit after it is erased.

I There is no need for sophisticated error correcting codes.10/75

hey

10

noiseless binary forward channel

binary search algorithm would provide an effective procedure fortransmitting the information involved in the source’s choice.

I Receiver starts out with a uniform prior distribution for the messagepoint selected

I The a priori median of the receiver distribution is m0 = 1/2

I Suppose 1 was sent. Hence, the new receiver distribution is uniformover the interval (1/2, 1).

[0 11

2

✓ = 101

]3

4

⇥5

8

11/75

hey

11

Outline

I Introduction






I Two way channel


12/75

hey

12

Horstein Scheme

I Horstein developed a recursive coding strategy for the BSC withnoiseless feedback

I We could call Horstein’s feedback scheme to binary search algorithmwith lies.

I The transmitter transmits a 0 on the (i + 1)st transmission when thetrue message point θ is to the left of the current median mi , a 1otherwise.

I However, since crossovers can occur in the channel, the transmitterdoes not know what the receiver’s current median mi is.

I A noiseless feedback channel is used to provide the transmitter withthis information.

13/75

hey

13

Horstein Scheme

I Divide the interval [0, 1] into 2nR equidistance subinterval.

I Represent each message by the midpoint of each interval.

I receiver has no prior knowledge of the location of θ receiver densityos initially uniform. f0(θ) = 1 for θ ∈ [0, 1]

x1 = g(θ0) =

{1 if θ0 is greater than 1/20 o.w

✓

1

1

f(✓)

m0 =1

2

14/75

hey

14

Horstein Scheme

I Assume x1 = 1 is sent through the channel. It gets corrupted withprobability p.

I After the channel output is observed, the receiver distribution andmedian is updated.

I Through the noiseless feedback, the encoder learns the distribution.

f (θ|y1) =f (θ)p(y1|θ)∫ 1

0f (θ)p(y1|θ)dθ

I Assume y1 = 1 is received.I For 0 ≤ θ ≤ 1/2, then f (θ|y1 = 1) = 2p

I For 1/2 ≤ θ ≤ 1, then f (θ|y1 = 1) = 2p̄

15/75

hey

15

Horstein Scheme

I The encoder transmits 1, if θ > median of f (θ|y1),and 0 otherwise.

1

0 1m0

f(✓)

0 1m1

2p

2p̄

f✓|Y1(✓|1)

1m2

4pp̄

4p̄2

X1 = 1Y1 = 1

X2 = 0 X3 = 1Y2 = 0

f✓|Y 2(✓|10)

I Terminates when most of the probability mass is concentrated in theneighborhood of one of he possible message points

16/75

hey

16

Horstein Scheme

The schemes can admit a simple recursive structure

f (θ|y i−1) =

{2pf (θ|y i−2) if yi−1 is greater than median of f (θ|y i−1)2p̄f (θ|y i−2) o.w

Terminates when the receiver distribution is sufficiently steep.

1

0 1

1

2

m0 m1

p

CDF

✓17/75

hey

17

Horstein Scheme- Decoding

I The decoder uses maximal posterior decoding

I It finds the interval of length 2−nR that maximizes∫ β+2−nR

β

f (θ|yn) dθ

18/75

hey

18

Horstein Scheme- Error probability

I By analysis of the evolution of f (Θ|Y i ), i ∈ [1 : n], based on theiterated function system, it can be shown that

p(θ 6∈ [β, β + 2−nR ])→ 0 as n→∞ if R < C

I With high probability, θ(M) is the unique message point within the[β, β + 2−nR)

19/75

hey

19

Outline

I Introduction






I Two way channel


20/75

hey

20

Block Feedback Coding Scheme for BSC

Ahlswede 1973:

I Implement the iterative refinment at the block level

I The encoder initially transmits an uncoded block of information

I It then refines the receiver’s knowledge about it in subsequent blocks

21/75

hey

21


I Tx. 1: Sends N uncoded data bits over channel.

I Ch. 1: Adds (modulo-2) N samples of Bern(p) noise

I Rx. 1: Feeds its N noisy observations back to Tx.

I Tx.2:I (a) Finds N samples of noise added by channel.

I (b) Compresses noise into NH(p) new data bits.

I (c) Sends these data bits uncoded over the channel

I Ch.2: Adds (modulo-2) NH(p) samples of Bern(p) noise.

I Rx.2: Feeds its NH(p) noisy observations back to Tx

22/75

hey

22


the number of channel inputs used to send the N bits would be

N + NH(p) + NH(p)2 + ... =N

1− H(p)

which corresponds to a rate of 1− H(p), the capacity of BSC(p)

23/75

hey

23

Outline

I Introduction






I Two way channel


24/75

hey

24

Robbins-Monro procedure

How to determine θ, a zero of a function g(x), without knowing theshape of the function?

I The observations are noisy

I instead of g(x), one obtains Y (x) = g(x) + Z

25/75

hey

25

Robbins-Monro procedure

Xn+1 = Xn − anYn(Xn), n = 1, 2, ..

I where∑

an =∞, and∑

a2n <∞⇒ Xn → θ Almost surely

✓ X1

}

Y1(X1)

Z1

g(x)

}Z2

X2

1

a11

a2

X3

26/75

hey

26

Schalkwijk-Kailath scheme

put straight line g(x) = x − θI Start with X1 = 1

2 , send the receiver the number g(X1) = (X1 − θ)

I The receiver obtains the number Y1(X1) = (X1 − θ) + Z1, whereZ1 ∼ N(0, 1)

✓0[ ]

1X1

g(X) = X � ✓

} Z1

Y1(X1)

27/75

hey

27


The recursion is easily solved to yield

Xn+1 = θ − 1

n

n∑1

Zi ∼ N(θ, 1/n)

✓0[]

1X1

g(X) = X � ✓

+

Yn(Xn) = g(Xn) + Zn

Zn

Xn+1 = Xn � 1

nYn(Xn)

g(Xn) = Xn � ✓g(X1)Enc Dec

28/75

hey

28


✓ X1

}Y1(X1)

Z1

g(x) = x � ✓

X2X3

}

29/75

hey

29

Schalkwijk-Kailath Coding

Another interpretation of SK scheme, with expected average transmittedpower constraint

30/75

hey

30


Y = X + Z , Z ∼ N(0, 1)

I Expected average transmitted power constraint

n∑i=1

E (g2i (m,Y i−1)) ≤ nP m ∈ [1 : 2nR ]

I Divide the interval [−√p,√p] into 2nR message interval

I Represent each message m by the midpoint of its interval

[ ]�p

pp

p

⇥

✓(m)

⇥� = 2

pp.2�nR

31/75

hey

31


I The transmitter first sends the message point it self:

X0 = θ(m)

I It is corrupted by additive Gaussian noise, so received with some bias

Y0 = θ(m) + Z0

I The goal of the transmitter is to refine the receiver’s knowledge ofthe bias

I It computes the MMSE estimate of the bias given the outputsequence observed thus far, and sending the error term

32/75

hey

32


I For i = 1, encoder learns Z0 = Y0 − X0 and transmits

X1 = γ1Z0

γ1 =√p is chosen so that E (X 2

1 ) = P

I Sends the Gaussian random variable Z0 to the receiver, thusreducing the effect of the noise on the original transmission

I For i ∈ [2 : n], it transmits

Xi = γi (Z0 − E (Z0|Y i−1))

γi is chosen to meet power constraint

33/75

hey

33

Schalkwijk-Kailath Decoding rule

I After the n transmissions to convey Z0, the receiver combines itsestimate of Z0 with Y0 to get an estimate of message point

I The receiver, uses a nearest neighbor decoding rule to recover themessage point.

Θ̂n = Y0 − E (Z0|Y n) = θ(m) + Z0 − E (Z0|Y n)

34/75

hey

34

Schalkwijk-Kailath Error Analysis

TheoremThe probability of decoding error decreases as a second-order exponent inblock length for rates below capacity.

I Decoder makes an error if Θ̂n is closer to the nearest neighbors ofθ(m) than to θ(m)

|Θ̂n − θ(m)| > ∆/2

pne ≤ 2Q(2nC(P)∆/2), Q(x) =

∫ ∞x

1√2π

e−t2/2 dt

35/75

hey

35


I (Z0;Y n) =n∑

i=1

I (Z0;Yi |Y i−1)

=n∑

i=1

(h(Yi |Y i−1)− h(Yi |Z0,Yi−1)

=n∑

i=1

(h(Yi )− h(Zi |Z0,Yi−1))

=n∑

i=1

(h(Yi )− h(Zi ))

=n

2log(1 + P)

= nC (P)

37/75

hey

37


TheoremChannel input is independent of the previous output Y i−1.

Proof.

I Z0 ⊥ Z1, and Gaussian

I Y1 = γ1Z0 + Z1 ⇒ E (Z0|Y1) is linear in Y1

I X2 = γ2(Z0 − E (Z0|Y1) is Gaussian ⊥ Y1

I Z2 is Gaussian ⊥ Y1

I Y2 = X2 + Z2 is Gaussian ⊥ Y1

...

38/75

hey

38

Error Exponent

Var(Z0|Y n) = 2−2nC(P)

Θ̂n ∼ N(θ(m), 2−2nC(P))

[Shannon 59]: No feedback

p(n)e = e−O(n)

With feedbackp(n)e = exp(− exp(O(n(C − R)))

39/75

hey

39


Important observation:

X1 ∝ Z0 ∼ N(0, 1)

Xi ∝ Z0 − E (Z0|Y i−1)⊥ Y i−1

41/75

hey

41

Schalkwijk-Kailath observation

TheoremChannel input is independent of the previous output Y i−1.

Proof.

I Z0 ⊥ Z1, and Gaussian

I Y1 = γ1Z0 + Z1 ⇒ E (Z0|Y1) is linear in Y1

I X2 = γ2(Z0 − E (Z0|Y1) is Gaussian ⊥ Y1

I Z2 is Gaussian ⊥ Y1

I Y2 = X2 + Z2 is Gaussian ⊥ Y1

...

42/75

hey

42

Outline

I Introduction






I Two way channel


43/75

hey

43

Posterior Matching Scheme

I At each time, the receiver calculates the a-posteriori density functionof the message point. fn(θ) = fθ|yn(θ|yn)

I Transmitter can track fn(θ) as well.

I The goal is to select the transmission function gn, for the fastconcentration of fn(θ) around θ0

44/75

hey

44

What is the best selection of the transmission functions gi?

gi (θ,Yi−1) = F−1

X ◦ FΘ|Y i−1 (θ|Y i−1)

information regarding θ0 still missing at the receiver is extracted by

I Encoder first extracts the information missing at the receiver fromthe a-posteriori by

I Generating a random variable that is statistically independent of pastobservations,

I When coupled with those observations, uniquely produces theintended message θ0

I This information is then matched to the optimal input distributionof the channel, FX , to achieve capacity

I Stretching the posterior into the desired input distribution

45/75

hey

45

Posterior Matching Scheme

The input to the channel is a set of random variables given by

X1 = F−1X (FΘ(Θ))

Xi = F−1X (FΘ|Y i−1 (Θ|Y i−1))

Note that because FΘ|Y i−1 (Θ|Y i−1) is distributed uniformly on [0, 1],

regardless of the sequence Y i−1, it follows that

I Xi is independent of Y i−1 and , due to the memoryless nature of thechannel, Yi is independent of Y i−1

I The marginal distribution on Xi is PX , the capacity achievingdistribution, Consequently, {Yi} are i.i.d.

46/75

hey

46

Proposition

Let X be a real-valued random variable. The random variableZ = FX (X ) is uniformly distributed on [0, 1].

Proof.Let Z = FX (X )

FZ (x) = p(Z ≤ x)

= p(FX (X ) ≤ x)

= p(X ≤ F−1X (x))

= FX (F−1X (x))

= x

For any x ∈ [0, 1], which shows that Z is a uniform random variable on[0, 1]

47/75

hey

47

Proposition

Suppose that Θ ∼ U [0, 1] and let X be a real-valued random variable.Then the random variable Y = F−1

X (Θ) has the same distribution as X .on [0, 1].

Proof.

FY (x) = p(Y ≤ x)

= p(F−1X (Θ) ≤ x)

= p(Θ ≤ FX (x))

= FX (x)

FX (x) ∈ [0, 1] for all x ⇒ X ,Y have the same distribution

48/75

hey

48

Posterior matching AWGN channel

I Let pY |X be an AWGN channel with noise variance N

I set Gaussian input distribution X ∼ N(0,P) (capacity achieving foran input power constraint P

I derive posterior matching scheme in this case

I Let SNR = P/N

Xi+1 =√

1 + SNR(Xi −

SNR

1 + SNRYi

)I The transmitter sends the error term pertaining to the MMSE

estimate of Xi from Yi

49/75

hey

49

Posterior Matching Scheme-BSC

I Set Px = Bern(1/2) (capacity achieving)

I the PM scheme coincides with Horstein’s median rule

Xn+1 = F−1X (FΘ0|Y n(Θo |Y n)) =

{1 if Θ0 > median{fΘ0|Y n(◦|Y n)}0 o.w

I F−1X quantizes above/below 1

2

50/75

hey

50

Outline

I Introduction






I Two way channel


51/75

hey

51

Multiple Access channel, No feedback

p(Y |X1X2)

X1

X2

Y

Capacity region:

R1 < I (X1;Y |X2)

R2 < I (X2;Y |X1)

R1 + R2 < I (X1X2;Y )

for all p(X1)p(X2)p(Y |X1X2)

52/75

hey

52

Multiple access channel, No feedback

1

1

1

2

1

2

R1

R2

R1 < 1

R2 < 1

R1 + R2 < 1.5

53/75

hey

53

Does feedback help in MAC?

Yes! Gaarder-Wolf 1975

54/75

hey

54

Erasure MAC with feedback

I Rsym = 2/3: N uncoded transmissions + N/2 one-sidedretransmissions:

transmitter 1: 010010101011100

transmitter 2: 110100011011001

Output: 120110112022101

prob. of erasure=1/2⇒ N/2 bits are erased.

I transmitter 1 retransmits the erased bits over the next N/2transmissions

N bits are sent over N + N/2 transmission ⇒ R = 2/3 is achievable

55/75

hey

55

block feedback coding scheme:Erasure MAC with feedback

I Rsym = 3/4 : N uncoded transmissions + N/4 two-sidedretransmissions + N/16 + ...

I the two encoders can cooperate by each sending half of the N/2erased bits over the following N/4 transmissions

R =N

N + N/4 + N/16 + ...= 3/4

1

1

R1

R2

3/4

3/4

56/75

hey

56

Erasure MAC with feedback (Gaarder-Wolf 1975)

I Rsym = 0.7602 : N uncoded transmissions + N/(2 log 3) cooperativeretransmission

I Can cooperate and use three symbols: (0, 0), (1, 1) and (1, 0)

resolve erasure at log2 3 bit/channel use

R =N

N + N/(2 log 3)= 0.7602

1

1

R1

R2

(0.76, 0.76)

57/75

hey

57

Can we do better? Cover-Leung inner bound

Rsym = 0.7911 (Cover-Leung 1981)

Theorem(R1,R2) is achievable for MAC with feedback if

R1 < I (X1;Y |X2,U),

R2 < I (X2;Y |X1,U)

R1 + R2 < I (X1,X2;Y )

for some p(u)p(x1|u)p(x2|u)

Enc1

Enc2

P (Y |X1X2) Dec

Y n(j � 1)

M̃2,j�1 , M1,j

, M2,jM2,j�1

Xn1 (j)

Xn2 (j)

58/75

hey

58

Cover-Leung Achievability proof

I Block Markov coding:

Messages are sent over b blocks of transmission.

59/75

hey

59

Outline

I Introduction

I point to point communicationI Horstein coding Scheme





I Two way channel


60/75

hey

60

Two way channel

Shannoninner bound:

R1 < I (X1;Y |X2)

R2 < I (X2;Y |X1)

for some p(x1)p(x2)

61/75

hey

61

Two way channel

Shannon outer bound:

R1 < I (X1;Y |X2)

R2 < I (X2;Y |X1)

for some p(x1, x2)

62/75

hey

62

Directed information

1 Entropy

H(Y n) =n∑

i=1

H(Yi |Y i−1)

2 Conditional entropy

H(Y n|X n) =n∑

i=1

H(Yi |Y i−1,X n)

3 Causally-conditioned entropy

H(Y n||X n) =n∑

i=1

H(Yi |Y i−1,X i )

1− 2⇒ I (Y n;X n) Mutual information1− 3⇒ I (Y n → X n) Directed information

63/75

hey

63

Directed infromation

Directed information from a random vector AN to another random vectorBN is:

I (AN → BN) =N∑

n=1

I (An;Bn|Bn−1)

mutual information:

I (AN ;BN) =N∑

n=1

I (AN ;Bn|Bn−1)

64/75

hey

64

TWC- Capacity region Kramer 2003

TheoremLet RN be the set of rate pairs (R1,R2)

R1 ≤1

NI (XN

1 → Y N ||XN2 )

R2 ≤1

NI (XN

2 → Y N ||XN1 )

for some p(xN1 ||yN−1)p(xN2 ||yN−1). Then R = ∪NRN

65/75

hey

65

Outline

I Introduction






I Two way channel


66/75

hey

66

Source coding with side information

Time

Source

Encoder

Side info

1 2 3 4 5 6 7 8 9 10

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10

Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10

X̂1 X̂2 X̂3 X̂4 X̂5

W W

Decoder

Encoder DecoderX

Y

X̂W

Block length = 5

67/75

hey

67

Source coding with feedforward

Time

Source

Encoder

Side info

1 2 3 4 5 6 7 8 9 10

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10

X1 X2 X3 X4

X̂1 X̂2 X̂3 X̂4 X̂5

W W

Decoder

Encoder DecoderX X̂W

Block length = 5 Delay = 6 Delay 1 FF

68/75

hey

68

Source coding with Feedforward

(N, 2NR) source code:encoding function

f : XN → {1, .., 2NR}Decoding function:

gn : {1, .., 2NR} × X n−1 → X̂ , n = 1, ..,N

69/75

hey

69

Directed information

1 Entropy

H(Y n) =n∑

i=1

H(Yi |Y i−1)

2 Conditional entropy

H(Y n|X n) =n∑

i=1

H(Yi |Y i−1,X n)

3 Causally-conditioned entropy

H(Y n||X n) =n∑

i=1

H(Yi |Y i−1,X i )

1− 2⇒ I (Y n;X n) Mutual information1− 3⇒ I (Y n → X n) Directed information

70/75

hey

70

Directed infromation

Directed information from a random vector AN to another random vectorBN is:

I (AN → BN) =N∑

n=1

I (An;Bn|Bn−1)

mutual information:

I (AN ;BN) =N∑

n=1

I (AN ;Bn|Bn−1)

71/75

hey

71

Directed information from the reconstruction X̂N to the source XN :

I (X̂N → XN) = I (XN ; X̂N)−N∑

n=2

I (X n−1; X̂n|X̂ n−1)

72/75

hey

72

I a direct coding theorem for a general source with feed-forwardassuming that the joint random process {Xn, X̂n} is discrete,stationary and ergodic

I for stationary and ergodic joint processes, the directed informationrate exists and is defined by

I (X̂ → X ) = limN→∞

1

NI (X̂N → XN)

73/75

hey

73

TheoremFor a discrete stationary and ergodic source X characterized by adistribution PX , all rates R such that

R ≥ R∗(D) = infPX̂|X :limN→∞ E [dN (XN ,X̂N )]≤D

I (X̂ → X )

are achievable at expected distortion D.

Proof.The proof uses AEP for directed qualities.

− 1

NlogP(X̂N ||XN)→ H(X̂ ||X ) w .p.1

define a new kind of typicality that we call ”directed typicality”.

74/75

hey

74

Thank you

Questions?

75/75

hey

75

The Role of Feedback in Communicationsece.drexel.edu/walsh/Solmaz_FeedbackComm.pdf · noiseless...

Documents

Transcript of The Role of Feedback in Communicationsece.drexel.edu/walsh/Solmaz_FeedbackComm.pdf · noiseless...