The Role of Feedback in Communicationsece.drexel.edu/walsh/Solmaz_FeedbackComm.pdf · noiseless...

The Role of Feedback in Communications

Solmaz Torabi

Dept. of Electrical and Computer EngineeringDrexel Universityst669@drexel.edu

Advisor: Dr. John M. Walsh

April 22, 2015

References I

M. Horstein, “Sequential transmission using noiseless feedback,”Information Theory, IEEE Transactions on, vol. 9, no. 3, pp.136–143, 1963.

J. Schalkwijk and T. Kailath, “A coding scheme for additive noisechannels with feedback–i: No bandwidth constraint,” InformationTheory, IEEE Transactions on, vol. 12, no. 2, pp. 172–182, 1966.

O. Shayevitz and M. Feder, “Optimal feedback communication viaposterior matching,” Information Theory, IEEE Transactions on,vol. 57, no. 3, pp. 1186–1222, 2011.

N. Gaarder and J. K. Wolf, “The capacity region of a multiple-accessdiscrete memoryless channel can increase with feedback (corresp.),”Information Theory, IEEE Transactions on, vol. 21, no. 1, pp.100–102, 1975.

G. Kramer, “Directed information for channels with feedback,”Ph.D. dissertation, University of Manitoba, Canada, 1998.

References II

A. El Gamal and Y.-H. Kim, Network information theory.Cambridge University Press, 2011.

R. Venkataramanan and S. S. Pradhan, “Source coding withfeed-forward: Rate-distortion theorems and error exponents for ageneral source,” Information Theory, IEEE Transactions on, vol. 53,no. 6, pp. 2154–2179, 2007.

Outline

I Introduction

I Point to point communicationI Horstein coding Scheme

I Block Feedback Coding Scheme for BSC

I Schalkwijk-Kailath Coding scheme

I Posterior matching scheme

I Multiuser ChannelI Multiple access channel

I Two way channel

I Source coding with feedforward

DMC with feedback

Encoder p(y|x) DecoderXi Yi

Y i�1

I Memoryless channel: PYn|X n,Y n−1 (.|xn, yn−1) = P(.|xn)

I Messages: M

I Encoding functiongi :M×Y i−1 → X

I Xn is a function of (M,Y1,Y2, ...,Yn−1)

Point to point feedback communication system

Y i�1

channel is memoryless if P(yn|xn, yn−1) = P(yn|xn)

channel is used without feedback P(xn|xn−1, yn−1) = P(xn|xn−1)

DMC without feedback P(yN |xN) =N∏

P(yn|xn)

Y i�1

I If the channel is memory less, there is no information you get fromthe feedback that can help you increase your rate

CFB = maxp(x)

I (X ;Y ) = C

Capacity of Memoryless Feedback Channel

I Shannon 56: Feedback does not increase capacity of memorylesschannel

I Simplifiying schemes for attaining it:

I Horstein 63:Developed a recursive coding strategy for the BSC withnoiseless feedback(sequential coding acheme, varying block length)

I Schalkwijk-Kailath 66: AWGN channel

I Shayevtitz 2008: Extends the SK and Horstein schemes to generalmemoryless channels

I Gaarder-Wolf 75: Feedback enlarge the capacity region of multiuserchannels

Feedback can:

I Simplify coding scheme

I Improve reliability (decreases error prob. much faster)

I Increase capacity channels with memory

I Enlarge capacity region of multiuser channels (Gaarder-Wolf 1975)

Iterative refinement for BEC

I First send a message at a rate higher than the channel capacity(without coding)

I Then iteratively refine the receiver’s knowledge about the message

1 � p

n + pn + p2n + ... =n

1 � p

su�ces to transmit n bits reliably

I We can achieve the capacity C = 1− p by simply retransmittingeach bit after it is erased.

I There is no need for sophisticated error correcting codes.10/75

noiseless binary forward channel

binary search algorithm would provide an effective procedure fortransmitting the information involved in the source’s choice.

I Receiver starts out with a uniform prior distribution for the messagepoint selected

I The a priori median of the receiver distribution is m0 = 1/2

I Suppose 1 was sent. Hence, the new receiver distribution is uniformover the interval (1/2, 1).

✓ = 101

Outline

I Introduction

I Two way channel

Horstein Scheme

I Horstein developed a recursive coding strategy for the BSC withnoiseless feedback

I We could call Horstein’s feedback scheme to binary search algorithmwith lies.

I The transmitter transmits a 0 on the (i + 1)st transmission when thetrue message point θ is to the left of the current median mi , a 1otherwise.

I However, since crossovers can occur in the channel, the transmitterdoes not know what the receiver’s current median mi is.

I A noiseless feedback channel is used to provide the transmitter withthis information.

Horstein Scheme

I Divide the interval [0, 1] into 2nR equidistance subinterval.

I Represent each message by the midpoint of each interval.

I receiver has no prior knowledge of the location of θ receiver densityos initially uniform. f0(θ) = 1 for θ ∈ [0, 1]

x1 = g(θ0) =

{1 if θ0 is greater than 1/20 o.w

f(✓)

Horstein Scheme

I Assume x1 = 1 is sent through the channel. It gets corrupted withprobability p.

I After the channel output is observed, the receiver distribution andmedian is updated.

I Through the noiseless feedback, the encoder learns the distribution.

f (θ|y1) =f (θ)p(y1|θ)∫ 1

0f (θ)p(y1|θ)dθ

I Assume y1 = 1 is received.I For 0 ≤ θ ≤ 1/2, then f (θ|y1 = 1) = 2p

I For 1/2 ≤ θ ≤ 1, then f (θ|y1 = 1) = 2p̄

Horstein Scheme

I The encoder transmits 1, if θ > median of f (θ|y1),and 0 otherwise.

f(✓)

f✓|Y1(✓|1)

X1 = 1Y1 = 1

X2 = 0 X3 = 1Y2 = 0

f✓|Y 2(✓|10)

I Terminates when most of the probability mass is concentrated in theneighborhood of one of he possible message points

Horstein Scheme

The schemes can admit a simple recursive structure

f (θ|y i−1) =

{2pf (θ|y i−2) if yi−1 is greater than median of f (θ|y i−1)2p̄f (θ|y i−2) o.w

Terminates when the receiver distribution is sufficiently steep.

✓17/75

Horstein Scheme- Decoding

I The decoder uses maximal posterior decoding

I It finds the interval of length 2−nR that maximizes∫ β+2−nR

f (θ|yn) dθ

Horstein Scheme- Error probability

I By analysis of the evolution of f (Θ|Y i ), i ∈ [1 : n], based on theiterated function system, it can be shown that

p(θ 6∈ [β, β + 2−nR ])→ 0 as n→∞ if R < C

I With high probability, θ(M) is the unique message point within the[β, β + 2−nR)

Outline

I Introduction

I Two way channel

Block Feedback Coding Scheme for BSC

Ahlswede 1973:

I Implement the iterative refinment at the block level

I The encoder initially transmits an uncoded block of information

I It then refines the receiver’s knowledge about it in subsequent blocks

I Tx. 1: Sends N uncoded data bits over channel.

I Ch. 1: Adds (modulo-2) N samples of Bern(p) noise

I Rx. 1: Feeds its N noisy observations back to Tx.

I Tx.2:I (a) Finds N samples of noise added by channel.

I (b) Compresses noise into NH(p) new data bits.

I (c) Sends these data bits uncoded over the channel

I Ch.2: Adds (modulo-2) NH(p) samples of Bern(p) noise.

I Rx.2: Feeds its NH(p) noisy observations back to Tx

the number of channel inputs used to send the N bits would be

N + NH(p) + NH(p)2 + ... =N

1− H(p)

which corresponds to a rate of 1− H(p), the capacity of BSC(p)

Outline

I Introduction

I Two way channel

Robbins-Monro procedure

How to determine θ, a zero of a function g(x), without knowing theshape of the function?

I The observations are noisy

I instead of g(x), one obtains Y (x) = g(x) + Z

Robbins-Monro procedure

Xn+1 = Xn − anYn(Xn), n = 1, 2, ..

I where∑

an =∞, and∑

a2n <∞⇒ Xn → θ Almost surely

✓ X1

Y1(X1)

Schalkwijk-Kailath scheme

put straight line g(x) = x − θI Start with X1 = 1

2 , send the receiver the number g(X1) = (X1 − θ)

I The receiver obtains the number Y1(X1) = (X1 − θ) + Z1, whereZ1 ∼ N(0, 1)

✓0[ ]

g(X) = X � ✓

Y1(X1)

The recursion is easily solved to yield

Xn+1 = θ − 1

Zi ∼ N(θ, 1/n)

✓0[]

g(X) = X � ✓

Yn(Xn) = g(Xn) + Zn

Xn+1 = Xn � 1

nYn(Xn)

g(Xn) = Xn � ✓g(X1)Enc Dec

✓ X1

}Y1(X1)

g(x) = x � ✓

Schalkwijk-Kailath Coding

Another interpretation of SK scheme, with expected average transmittedpower constraint

Y = X + Z , Z ∼ N(0, 1)

I Expected average transmitted power constraint

n∑i=1

E (g2i (m,Y i−1)) ≤ nP m ∈ [1 : 2nR ]

I Divide the interval [−√p,√p] into 2nR message interval

I Represent each message m by the midpoint of its interval

[ ]�p

✓(m)

⇥� = 2

pp.2�nR

I The transmitter first sends the message point it self:

X0 = θ(m)

I It is corrupted by additive Gaussian noise, so received with some bias

Y0 = θ(m) + Z0

I The goal of the transmitter is to refine the receiver’s knowledge ofthe bias

I It computes the MMSE estimate of the bias given the outputsequence observed thus far, and sending the error term

I For i = 1, encoder learns Z0 = Y0 − X0 and transmits

X1 = γ1Z0

γ1 =√p is chosen so that E (X 2

1 ) = P

I Sends the Gaussian random variable Z0 to the receiver, thusreducing the effect of the noise on the original transmission

I For i ∈ [2 : n], it transmits

Xi = γi (Z0 − E (Z0|Y i−1))

γi is chosen to meet power constraint

Schalkwijk-Kailath Decoding rule

I After the n transmissions to convey Z0, the receiver combines itsestimate of Z0 with Y0 to get an estimate of message point

I The receiver, uses a nearest neighbor decoding rule to recover themessage point.

Θ̂n = Y0 − E (Z0|Y n) = θ(m) + Z0 − E (Z0|Y n)

Schalkwijk-Kailath Error Analysis

TheoremThe probability of decoding error decreases as a second-order exponent inblock length for rates below capacity.

I Decoder makes an error if Θ̂n is closer to the nearest neighbors ofθ(m) than to θ(m)

|Θ̂n − θ(m)| > ∆/2

pne ≤ 2Q(2nC(P)∆/2), Q(x) =

∫ ∞x

1√2π

e−t2/2 dt

Distribution of Θ̂n: Gaussian with mean θ(m), and variance ?

Θ̂n = Y0 − E (Z0|Y n) = θ(m) + Z0 − E (Z0|Y n)

I (Z0;Y n) = h(Z0)− h(Z0|Yn) =1

Var(Z0|Y n)

⇒ Var(Z0|Y n) = 2−nI (Z0;Y n)

I (Z0;Y n) =n∑

I (Z0;Yi |Y i−1)

(h(Yi |Y i−1)− h(Yi |Z0,Yi−1)

(h(Yi )− h(Zi |Z0,Yi−1))

(h(Yi )− h(Zi ))

2log(1 + P)

= nC (P)

TheoremChannel input is independent of the previous output Y i−1.

Proof.

I Z0 ⊥ Z1, and Gaussian

I Y1 = γ1Z0 + Z1 ⇒ E (Z0|Y1) is linear in Y1

I X2 = γ2(Z0 − E (Z0|Y1) is Gaussian ⊥ Y1

I Z2 is Gaussian ⊥ Y1

I Y2 = X2 + Z2 is Gaussian ⊥ Y1

Error Exponent

Var(Z0|Y n) = 2−2nC(P)

Θ̂n ∼ N(θ(m), 2−2nC(P))

[Shannon 59]: No feedback

p(n)e = e−O(n)

With feedbackp(n)e = exp(− exp(O(n(C − R)))

Recursion rule for SK scheme

Xi = γi (Z0 − E (Z0|Y i−1))

= γi (Z0 − E (Z0|Y i−2) + E (Z0|Y i−2)− E (Z0|Y i−1))

=γiγi−1

(Xi−1 − E (Xi−1|Y i−1)

=γiγi−1

(Xi−1 − E (Xi−1|Yi−1)

Xi ∝ Xi−1 − E (Xi−1|Yi−1)

Important observation:

X1 ∝ Z0 ∼ N(0, 1)

Xi ∝ Z0 − E (Z0|Y i−1)⊥ Y i−1

Schalkwijk-Kailath observation

TheoremChannel input is independent of the previous output Y i−1.

Proof.

I Z0 ⊥ Z1, and Gaussian

I Y1 = γ1Z0 + Z1 ⇒ E (Z0|Y1) is linear in Y1

I X2 = γ2(Z0 − E (Z0|Y1) is Gaussian ⊥ Y1

I Z2 is Gaussian ⊥ Y1

I Y2 = X2 + Z2 is Gaussian ⊥ Y1

Outline

I Introduction

I Two way channel

Posterior Matching Scheme

I At each time, the receiver calculates the a-posteriori density functionof the message point. fn(θ) = fθ|yn(θ|yn)

I Transmitter can track fn(θ) as well.

I The goal is to select the transmission function gn, for the fastconcentration of fn(θ) around θ0

What is the best selection of the transmission functions gi?

gi (θ,Yi−1) = F−1

X ◦ FΘ|Y i−1 (θ|Y i−1)

information regarding θ0 still missing at the receiver is extracted by

I Encoder first extracts the information missing at the receiver fromthe a-posteriori by

I Generating a random variable that is statistically independent of pastobservations,

I When coupled with those observations, uniquely produces theintended message θ0

I This information is then matched to the optimal input distributionof the channel, FX , to achieve capacity

I Stretching the posterior into the desired input distribution

Posterior Matching Scheme

The input to the channel is a set of random variables given by

X1 = F−1X (FΘ(Θ))

Xi = F−1X (FΘ|Y i−1 (Θ|Y i−1))

Note that because FΘ|Y i−1 (Θ|Y i−1) is distributed uniformly on [0, 1],

regardless of the sequence Y i−1, it follows that

I Xi is independent of Y i−1 and , due to the memoryless nature of thechannel, Yi is independent of Y i−1

I The marginal distribution on Xi is PX , the capacity achievingdistribution, Consequently, {Yi} are i.i.d.

Proposition

Let X be a real-valued random variable. The random variableZ = FX (X ) is uniformly distributed on [0, 1].

Proof.Let Z = FX (X )

FZ (x) = p(Z ≤ x)

= p(FX (X ) ≤ x)

= p(X ≤ F−1X (x))

= FX (F−1X (x))

For any x ∈ [0, 1], which shows that Z is a uniform random variable on[0, 1]

Proposition

Suppose that Θ ∼ U [0, 1] and let X be a real-valued random variable.Then the random variable Y = F−1

X (Θ) has the same distribution as X .on [0, 1].

Proof.

FY (x) = p(Y ≤ x)

= p(F−1X (Θ) ≤ x)

= p(Θ ≤ FX (x))

= FX (x)

FX (x) ∈ [0, 1] for all x ⇒ X ,Y have the same distribution

Posterior matching AWGN channel

I Let pY |X be an AWGN channel with noise variance N

I set Gaussian input distribution X ∼ N(0,P) (capacity achieving foran input power constraint P

I derive posterior matching scheme in this case

I Let SNR = P/N

Xi+1 =√

1 + SNR(Xi −

1 + SNRYi

)I The transmitter sends the error term pertaining to the MMSE

estimate of Xi from Yi

Posterior Matching Scheme-BSC

I Set Px = Bern(1/2) (capacity achieving)

I the PM scheme coincides with Horstein’s median rule

Xn+1 = F−1X (FΘ0|Y n(Θo |Y n)) =

{1 if Θ0 > median{fΘ0|Y n(◦|Y n)}0 o.w

I F−1X quantizes above/below 1

Outline

I Introduction

I Two way channel

Multiple Access channel, No feedback

p(Y |X1X2)

Capacity region:

R1 < I (X1;Y |X2)

R2 < I (X2;Y |X1)

R1 + R2 < I (X1X2;Y )

for all p(X1)p(X2)p(Y |X1X2)

Multiple access channel, No feedback

R1 < 1

R2 < 1

R1 + R2 < 1.5

Does feedback help in MAC?

Yes! Gaarder-Wolf 1975

Erasure MAC with feedback

I Rsym = 2/3: N uncoded transmissions + N/2 one-sidedretransmissions:

transmitter 1: 010010101011100

transmitter 2: 110100011011001

Output: 120110112022101

prob. of erasure=1/2⇒ N/2 bits are erased.

I transmitter 1 retransmits the erased bits over the next N/2transmissions

N bits are sent over N + N/2 transmission ⇒ R = 2/3 is achievable

block feedback coding scheme:Erasure MAC with feedback

I Rsym = 3/4 : N uncoded transmissions + N/4 two-sidedretransmissions + N/16 + ...

I the two encoders can cooperate by each sending half of the N/2erased bits over the following N/4 transmissions

N + N/4 + N/16 + ...= 3/4

Erasure MAC with feedback (Gaarder-Wolf 1975)

I Rsym = 0.7602 : N uncoded transmissions + N/(2 log 3) cooperativeretransmission

I Can cooperate and use three symbols: (0, 0), (1, 1) and (1, 0)

resolve erasure at log2 3 bit/channel use

N + N/(2 log 3)= 0.7602

(0.76, 0.76)

Can we do better? Cover-Leung inner bound

Rsym = 0.7911 (Cover-Leung 1981)

Theorem(R1,R2) is achievable for MAC with feedback if

R1 < I (X1;Y |X2,U),

R2 < I (X2;Y |X1,U)

R1 + R2 < I (X1,X2;Y )

for some p(u)p(x1|u)p(x2|u)

P (Y |X1X2) Dec

Y n(j � 1)

M̃2,j�1 , M1,j

, M2,jM2,j�1

Xn1 (j)

Xn2 (j)

Cover-Leung Achievability proof

I Block Markov coding:

Messages are sent over b blocks of transmission.

Outline

I Introduction

I point to point communicationI Horstein coding Scheme

I Two way channel

Two way channel

Shannoninner bound:

R1 < I (X1;Y |X2)

R2 < I (X2;Y |X1)

for some p(x1)p(x2)

Two way channel

Shannon outer bound:

R1 < I (X1;Y |X2)

R2 < I (X2;Y |X1)

for some p(x1, x2)

Directed information

1 Entropy

H(Y n) =n∑

H(Yi |Y i−1)

2 Conditional entropy

H(Y n|X n) =n∑

H(Yi |Y i−1,X n)

3 Causally-conditioned entropy

H(Y n||X n) =n∑

H(Yi |Y i−1,X i )

1− 2⇒ I (Y n;X n) Mutual information1− 3⇒ I (Y n → X n) Directed information

Directed infromation

Directed information from a random vector AN to another random vectorBN is:

I (AN → BN) =N∑

I (An;Bn|Bn−1)

mutual information:

I (AN ;BN) =N∑

I (AN ;Bn|Bn−1)

TWC- Capacity region Kramer 2003

TheoremLet RN be the set of rate pairs (R1,R2)

R1 ≤1

NI (XN

1 → Y N ||XN2 )

R2 ≤1

NI (XN

2 → Y N ||XN1 )

for some p(xN1 ||yN−1)p(xN2 ||yN−1). Then R = ∪NRN

Outline

I Introduction

I Two way channel

Source coding with side information

Source

Encoder

Side info

1 2 3 4 5 6 7 8 9 10

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10

Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10

X̂1 X̂2 X̂3 X̂4 X̂5

Decoder

Encoder DecoderX

Block length = 5

Source coding with feedforward

Source

Encoder

Side info

1 2 3 4 5 6 7 8 9 10

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10

X1 X2 X3 X4

X̂1 X̂2 X̂3 X̂4 X̂5

Decoder

Encoder DecoderX X̂W

Block length = 5 Delay = 6 Delay 1 FF

Source coding with Feedforward

(N, 2NR) source code:encoding function

f : XN → {1, .., 2NR}Decoding function:

gn : {1, .., 2NR} × X n−1 → X̂ , n = 1, ..,N

Directed information

1 Entropy

H(Y n) =n∑

H(Yi |Y i−1)

2 Conditional entropy

H(Y n|X n) =n∑

H(Yi |Y i−1,X n)

3 Causally-conditioned entropy

H(Y n||X n) =n∑

H(Yi |Y i−1,X i )

1− 2⇒ I (Y n;X n) Mutual information1− 3⇒ I (Y n → X n) Directed information

Directed infromation

Directed information from a random vector AN to another random vectorBN is:

I (AN → BN) =N∑

I (An;Bn|Bn−1)

mutual information:

I (AN ;BN) =N∑

I (AN ;Bn|Bn−1)

Directed information from the reconstruction X̂N to the source XN :

I (X̂N → XN) = I (XN ; X̂N)−N∑

I (X n−1; X̂n|X̂ n−1)

I a direct coding theorem for a general source with feed-forwardassuming that the joint random process {Xn, X̂n} is discrete,stationary and ergodic

I for stationary and ergodic joint processes, the directed informationrate exists and is defined by

I (X̂ → X ) = limN→∞

NI (X̂N → XN)

TheoremFor a discrete stationary and ergodic source X characterized by adistribution PX , all rates R such that

R ≥ R∗(D) = infPX̂|X :limN→∞ E [dN (XN ,X̂N )]≤D

I (X̂ → X )

are achievable at expected distortion D.

Proof.The proof uses AEP for directed qualities.

NlogP(X̂N ||XN)→ H(X̂ ||X ) w .p.1

define a new kind of typicality that we call ”directed typicality”.

Thank you

Questions?

The Role of Feedback in Communicationsece.drexel.edu/walsh/Solmaz_FeedbackComm.pdf · noiseless...

Documents

Transcript of The Role of Feedback in Communicationsece.drexel.edu/walsh/Solmaz_FeedbackComm.pdf · noiseless...

Noiseless Data Compression with Low-Density Parity …verdu/reprints/dimacs2004.pdf · Noiseless Data Compression with Low-Density Parity-Check ... Error Correcting Codes, Source

Arithmetic Operators MeaningOperator Addition Subtraction Multiplication Division Modulus + - * / % Type Binary Binary Binary Binary Binary.

Efficient noiseless linear amplification for light fields with …physics.snu.ac.kr/hjeong/pdf/oe-24-2-1331.pdf · 2016. 1. 28. · Efficient noiseless linear amplification for

E˜ective Poster & Publicity Strategies

Multi-mode CV-QKD with Noiseless Attenuation and Ampliﬁcation · Fig. 1. The system diagram of our multi-mode CV-QKD protocol. A noiseless attenuation operation is applied to the

TEST SECTIONS OF NOISELESS CEMENT CONCRETE PAVEMENTS CONCLUSIONS

E˜ective, non-surgical treatment Fast & E˜ective 555.555 ... · Fast & E˜ective Treatment 555.555.5555 Hemorrhoids feel like this. Relief looks like this. E˜ective, non-surgical

Benchmarking Cellular Genetic Algorithms on the BBOB Noiseless … 2013. 7. 9. · Benchmarking Cellular Genetic Algorithms on the BBOB Noiseless Testbed Neal Holtschulte July 6, 2013.

A Reï¬‚ective Symmetry Descriptor

REMINGTON NOISELESS

Properties of E ective Hamiltonian

Noiseless Data Compression With Low-Density Parity-Check

Noiseless Heater Vacuum Relief Valve 16 heater.pdf · Noiseless Heater Vacuum Relief Valve Step 0 Type/Structure/Features Please refer to this for structure and feature of Noiseless

Rubikstega: A Novel Noiseless Steganography Method in Rubik’s …rinaldi.munir/TA/... · 2019. 3. 28. · Rubikstega: A Novel Noiseless Steganography Method in Rubik’s Cube Ade

Binary forms of given discriminant and given invariant orderpub.math.leidenuniv.nl/~evertsejh/16-taipei.pdf5/41 An e ective result For 2Q, denote by h( ) the absolute logarithmic Weil

GOOD DESIGN AWARD 2013 Super Noiseless series Natural ...

ective January 1, 2021 - safetyclamps.com

A ective Processes in Evaluation

Noiseless Pavements

Promoting Re ective Functioning - Michigan