pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the...

42

Transcript of pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the...

Page 1: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

A Converse Coding Theorem for MismatchedDecoding at the Output of Binary-InputMemoryless Channels �V.B.BalakirskyThe author is from The Data Security Association "Con�dent",St.-Petersburg, RussiaE-mails:[email protected]@[email protected]. An upper bound on the maximal transmission rate overbinary-input memoryless channels, provided that the decoding decision rule isgiven, is derived. If the decision rule is equivalent to the maximum likelihooddecoding (matched decoding), then the bound coincides with the channel ca-pacity. Otherwise (mismatched decoding), it coincides with a known lowerbound.Key words :channel capacity, mismatched decoding.�The work was supported by a Scholarship from the Swedish Institute, Stockholm,Sweden. 1

Page 2: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

1 IntroductionShannon's coding theorem on the capacity of memoryless channels [1]may be presented as a result of maximization of the transmission rate overall possible block codes and all possible decoding algorithms. An open in-formation theory problem is to generalize this theorem to the case whenoptimization over decoding algorithms is forbidden [2-10]. More precisely,the decoder calculates the value of some distortion (metric) function for eachcodeword and makes a decision that the codeword with the smallest distor-tion was sent. Since the distortion function is not necessarily matched tothe channel's characteristics, the maximal transmission rate can be less thanthe channel capacity. The current state of the mismatched problem and itsconnections with other open problems of information and coding theory aregiven in the recent paper by I.Csisz�ar and P.Narayan [9], and we cannot doit better.We deal with the binary-input memoryless channels and prove a conversestatement to the direct coding theorem [3]. As a result, we obtain that themaximal transmission rate over any binary-input memoryless channel can befound for any distortion function using a single-letter characterization.The paper is organized as follows. In Section 2 we introduce some nota-tions, which will be used in the analysis. The main result is formulated anddiscussed in Section 3. In Section 4 we discuss the basic ideas of the proof ofthe theorem, which lead to the results, called a 'combinatorial approximationlemma' and a 'permutation lemma'. Section 5 is devoted to the proof of thecombinatorial approximation lemma, and Section 6 is devoted to the proof ofthe permutation lemma. Some properties of the mismatched decoding andbasic ideas of the proof are illustrated in the Appendix for speci�c data.2 Notation1. The channel's input and output alphabet will be denoted by X and Y ,respectively. We assume that X = f0; 1g; but write X instead of f0; 1gmeaning that many of the further considerations can be extended to ageneral case.2. The number of symbols x 2 X in x 2 Xn; the number of symbolsy 2 Y in y 2 Y n; and the number of pairs of symbols (x; y) 2 X � Y2

Page 3: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

in (x;y) 2 Xn � Y n will be denoted bynx(x) = nXj=1�f xj = x gny(y) = nXj=1�f yj = y gnx;y(x;y) = nXj=1�f xj = x; yj = y g:Hereafter, � denotes the indicator function of the event in the braces:�f�g = 1 if the statement � is true and �f�g = 0 otherwise.3. We introduce the special notation for the empirical probability distri-butions generated by the numbers nx(x); ny(y); and nx;y(x;y); where(x; y) 2 X � Y: These distributions will be referred to as the composi-tions of x 2 Xn; y 2 Y n; and (x;y) 2 Xn � Y n :Comp( x ) = fnx(x)=ngComp( y ) = fny(y)=ngComp( x;y ) = fnx;y(x;y)=ng:Furthermore, we introduce the conditional composition of y 2 Y n givenx 2 Xn Comp( yjx ) = fnx;y(x;y)=nx(x)g:4. The probability distributions fPxg; fVx(y)g, and fWx(y)g will be de-noted by P , V , and W , respectively.5. The set of types Pn on Xn and the set of conditional types VnP on Y ngiven P 2 Pn are introduced asPn = f P : n � Px are integers for all x 2 X gVnP = f V : n � Px � Vx(y) are integers for all (x; y) 2 X � Y g;The set of sequences of type P 2 Pn and the set of sequences of condi-tional type V 2 VnP given x 2 TnP are denoted byTnP = f x 2 Xn : Comp( x ) = P gTnV (x) = f y 2 Y n : Comp( yjx ) = V g:3

Page 4: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

6. A code Gn of rate R and length n, consisting of enR codewords, suchthat each codeword has the type P , i.e.,Comp( x ) = P; for all x 2 Gnwill be referred to as a P -composition code and denoted by GnP .7. The distortion (metric) function is introduced as d = fdx(y)g, whered0(y) = 00 � dx(y) � dmax <1; for all x 2 X; y 2 Y:An additive extension of this function is de�ned asd(x;y) = nXj=1 dxj(yj):8. We use the following notation for the marginal distribution PV on Y ,the entropy functionH(PV ); the conditional entropy functionH(V jP );the mutual information function I(P; V ), and the average distortionfunction in the ensemble f X � Y ; Px � Vx(y) g :PV (y) = Xx Px � Vx(y)H(PV ) = �Xy PV (y) � lnPV (y)H(V jP ) = �Xx;y Px � Vx(y) � lnVx(y)I(P; V ) = H(PV )�H(V jP )d(P; V ) = Xx;y Px � Vx(y) � dx(y):9. To simplify formalization, we will writej ~V � V j = maxx;y j ~Vx(y)� Vx(y)jfor any conditional probability distributions ~V = f ~Vx(y)g and V =fVx(y)g: 4

Page 5: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

The notation above is conventional, and is given to make the paper self-contained. The function, introduced below, seems to be new. This functionwill be used thoughout the paper.De�nition 2.1 (Upsilon Notation): Let f�ng be a given sequence. Weintroduce a Boolean function �(f�ng) such that �(f�ng) = 'TRUE' if andonly if (i�) the following statement: "there exist an " > 0 and n0(") < 1such that �n > " for all n > n0(")", is valid. Otherwise, �(f�ng) = 'FALSE':For the values of the function � we will use the relation 0 �0 meaning that'FALSE' � 'FALSE''FALSE' � 'TRUE''TRUE' � 'TRUE':For any given sequences, f�ng and f�ng; we say that f�ng approximatesf�ng if �(f�ng) � �(f�ng)i.e., �(f�ng) = 'TRUE' =) �(f�ng) = 'TRUE'and that f�ng approximates f�ng if�(f�ng) � �(f�ng)i.e., �(f�ng) = 'TRUE' =) �(f�ng) = 'TRUE':If both statements are valid, we say that f�ng and f�ng approximate eachother. In this case we write �(f�ng) = �(f�ng)i.e., �(f�ng) = 'TRUE' () �(f�ng) = 'TRUE':3 Statement of the Problem and a ConverseCoding TheoremOur considerations of the mismatched decoding for a memoryless channelW are based on constructing the references to a memoryless channel V; which5

Page 6: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

can be called a 'test-chanel'. For formal convenience, we introduce the testchannel as a real channel, existing between the sender and the receiver, andconsider the system model for information transmission given in Fig.1.Let us suppose that a P -composition block code GnP is used to transmitdata over two parallel memoryless channels V = fVx(y)g andW = fWx(y0)g:The conditional probabilities that the decoders D and D0 receive the vectorsy = (y1; :::; yn) 2 Y n and y0 = (y01; :::; y0n) 2 Y n; when a codeword x =(x1; :::; xn) 2 GnP was sent, are given asV (yjx) = nYj=1Vxj(yj) (3.1)W (y0jx) = nYj=1Wxj(y0j):We suppose that the decoders D and D0 estimate the transmitted codewordas x̂ 2 GnP and x̂0 2 GnP , respectively, using the same partitioning of the spaceY n with respect to the minimal value of the distortion (metric) function d;i.e., d(x̂;y) = minx2GnP d(x;y) (3.2)d(x̂0;y0) = minx02GnP d(x0;y0):If the minimum is attained for several codewords we assume that the receivedvector is decoded incorrectly. Then the decoding error probabilities for thecodeword x can be expressed asP (n)d (x; V ) = Xy V (yjx) � �f d(x;y) � D(x;y) g (3.3)P (n)d (x;W ) = Xy0 W (y0jx) � �f d(x;y0) � D(x;y0) g;where D(x;y) = minx̂2GnP nfxg d(x̂;y) (3.4)D(x;y0) = minx̂02GnP nfxg d(x̂0;y0):6

Page 7: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

Let P (n)d (V ) = maxx2GnP P (n)d (x; V ) (3.5)P (n)d (W ) = maxx2GnP P (n)d (x;W )denote the maximal decoding error probabilities for the code GnP ; and letfP (n)d (V )g and fP (n)d (W )g denote the sequences of the maximal decodingerror probabilities constructed for a sequence of codes fGnPg having a �xedrate R.The upsilon notation, introduced in Section 2, allows us to represent thelower bound on the maximal transmission rate for mismatched decoding asa corollary of the result, formulated below.Theorem 3.1: Let P be a given probability distribution on X = f0; 1gand let W be a given binary-input memoryless channel. Let�d(P;W ) = f V : PV = PW; (3.6)d(P; V ) � d(P;W ) g:Then �(fP (n)d (V )g) � �(fP (n)d (W )g) (3.7)for all V 2 �d(P;W ):Corollary 3.2: If R > Cd(P;W ); (3.8)where Cd(P;W ) = minV 2�d(P;W ) I(P; V ) (3.9)and the set �d(P;W ) is de�ned in (3.6), then�(fP (n)d (W )g) = 'TRUE' (3.10)for all P -composition codesGnP ; i.e., there exist "d(P;W ) > 0 and n0("; d; P;W ) <1 such that P (n)d (W ) � "d(P;W ) for all n > n0("; d; P;W )7

Page 8: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

and all P -composition codes GnP :Proof: For a sequence of P -composition codes fGnPg; we use a conversestatement to the coding theorem for the channel V , which minimizes themutual information at the right hand side of (3.9), and conclude that, if(3.8) is valid, then �(fP (n)d (V )g) = 'TRUE': (3.11)Combining (3.7) and (3.11), we obtain (3.10). Q.E.D.Corollary 3.3: If R > Cd(W ) (3.12)where Cd(W ) = maxP Cd(P;W ) (3.13)and the function Cd(P;W ) is de�ned in (3.9), then�(fP (n)d (W )g) = 'TRUE' (3.14)for all codes Gn; i.e., there exist "d(W ) > 0 and n0("; d;W ) <1 such thatP (n)d (W ) � "d(W ); for all n > n0("; d;W ) (3.15)and all codes Gn:Proof: There exists P such that any sequence of codes fGng; having a�xed code rate, contains a subsequence of P -composition codes fGnPg; hav-ing asymptotically the same rate [2]. Thus, (3.14) follows from (3.10). Q.E.D.Corollary 3.4: Let PW0 = fPW0(y)g be a given probability distributionon Y and let � � 0 be a given constant. IfVx(y) = fx � '(y) � e�dx(y) (3.16)where ffxg and f'(y)g are chosen in such a way thatXy Vx(y) = 1; for all x 2 X (3.17)and Xx Px � Vx(y) = PW0(y) for all y 2 Y (3.18)8

Page 9: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

then the maximal transmission rate, when a P -composition code is used, isnot greater than Cd(P;W ) = H(PW0)�H(V jP )for all binary-input channels W such thatPW = PW0 (3.19)d(P;W ) � d(P; V ):Proof: If the marginal distribution on Y is �xed by PW0, then the mini-mization of the mutual information function at the right hand side of (3.9) isequivalent to the maximization of the conditional entrropy function H(V jP )under the linear restrictions (3.17) and (3.18). This function is convex up,and we obtain (3.16) as a result of optimization using Lagrange multipliers.The restrictions (3.19) coincide with the restrictions at the right hand sideof (3.6) when we consider them as restrictions onW given PW and V . Q.E.D.Corollary 3.5: For any P -composition code GnP ; decoding, which min-imizes the distortion function d; is equivalent to the maximum likelihooddecoding for the channel V , given in (3.16), and the maximal transmissionrate is not greater than I(P; V ).Proof: Using (3.1) and (3.16) we write :lnV (yjx) = n �Xx Px � fx + nXj=1'(yj) + � � d(x;y):Since � < 0; maximization of the function lnV (yjx) over all codewords isequivalent to the minimization of the distortion function d(x;y): Q.E.D.Discussion: Let V be a probability distribution minimizing the mutualinformation at the right hand side of (3.9). We are interested in the caseI(P; V ) < R < I(P;W )because, otherwise, the maximum-likelihood decoding does not achieve anarbitrarily small error probability. LetAnd(x) = f y : d(x;y) � n � d(P;W ) g:9

Page 10: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

Then TnV (x);TnW (x) � And(x): Roughly speaking, a vector y 2 TnW (x)will be realized as a result of transmission of x over the channel W: SinceR < I(P;W ); we can choose a code GnP , for which "almost" all the vectorsbelonging to TnW (x) do not coincide with the vectors belonging to TnW (x0),x0 2 GnPnfxg. However, to analyze the decoding error probability in our caseit is necessary to estimate the number of d-"bad" points for x, i.e., the sizeof the intersection of TnW (x) and And(x0), x0 2 GnPnfxg. As it is well-known[2], ln jTnW (x)j=n ' H(W jP )ln jTnV (x)j=n ' H(V jP )and because V minimizes the mutual information at the right side of (3.9), itmaximizes the conditional entropy function H(V jP ): Therefore, most of thevectors belonging to And(x) have the conditional type V: Since R > I(P; V );the size of the intersection of TnV (x) and the union of TnV (x0), x0 2 GnPnfxg isasymptotically the same as jTnV (x)j (note that we use only a weak conversestatement for the channel V and lower-bound this size as " � jTnV (x)j; where" > 0 does not depend on n). In fact, the main result of the paper is thestatement that, for binary-input channels, this condition is su�cient to showthat the size of the intersection between TnW (x) and the union of And(x0),x0 2 GnPnfxg is asymptotically the same as jTnW (x)j.4 Basic Ideas of the Proof of the ConverseCoding Theorem4.1 Combinatorial Approximation of Memoryless Chan-nelsLet us consider the transmission of a codeword x 2 GnP over a memorylesschannel V: Suppose that V is a type on Y n given P 2 Pn; i.e., V 2 VnP . Thenwe may write : Comp(yjx) � V with the high probabilitymeaning that, with the high probability, the conditional compositions of thereceived sequences given x are close to the conditional type V . The exactformulation of this statement is shown below.10

Page 11: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

Lemma 4.1 ([2, Section 2.1]): LetN (P; V ) = f n : P 2 Pn and V 2 VnP gbe the set of lengths such that a given probability distribution P is a type onXn; and a given conditional probability distribution V is a conditional typeon Y n for that P: For any increasing sequence fng; whose elements belongsto N (P; V ), there exist sequences f�ng such that�n ! 0; �npn!1; as n!1and, for any x 2 TnP ,Xy V (yjx) � �f jComp(yjx)� V j < �n g � 1� n (4.1)where f ng is a sequence, depending on f�ng; such that n ! 0; as n!1:Convention 4.2: We assume that Px > 0 for all x 2 X; and that n 2N (P; V ): In later considerations, when we deal with two channels, V and W;we assume that n 2 N (P; V )\N (P;W ):The sequence f�ng; satisfying the conditions of Lemma 4.1, is assumed to begiven.Let [V ] = f ~V 2 VnP : j ~V � V j < �n g: (4.2)Then the inequality (4.1) can be rewritten asV (Tn[V ](x)jx) � 1� n (4.3)for any x 2 TnP , where Tn[V ](x) = [~V 2[V ]Tn~V (x): (4.4)Note also that d(x;y)=n = d(P; V ); for all y 2 TnV (x) (4.5)11

Page 12: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

and jd(x;y)=n� d(P; V )j < �n � dmax; for all y 2 Tn[V ](x): (4.6)The conditional probability to receive y depends only on Comp(yjx);and, for all ~V 2 VnP ; we may writeV (yjx) = p( ~V jV ) � 1jTn~V (x)j if Comp(yjx) = ~V (4.7)where p( ~V jV ) = jTn~V (x)j � expf n �Xx;y Px � ~Vx(y) � lnVx(y) g: (4.8)Therefore, information transmission over a memoryless channel V can berepresented as a choice of the conditional composition in accordance with thedistribution p( ~V jV ); ~V 2 VnP ; and a choice of a particular received sequencein accordance with the uniform distribution on Tn~V (x): The step, which wecall 'a combinatorial approximation', consists in the substitution of di�erentprobabilities, p0( ~V jV ); ~V 2 VnP ; for p( ~V jV ); ~V 2 VnP ; into the expression atthe right hand side of (4.7). Then we obtain a di�erent channel, which mayhave memory. For example, we can assign p0( ~V jV ) as a uniform distributionon the set [V ] and, because of (4.3), expect that the decoding error probabilityfor such a channel is approximately the same as for V: However, we will use aless tight approximation and assign p0( ~V jV ) as the indicator function of theevent ~V = V: The de�nition below formalizes this step, and the statement ofLemma 4.4 shows that this approximation can be used for our purposes.The considerations above are also valid for the channel W , and we con-tinue the parallel de�nitions of Section 3.De�nition 4.3: The channels V n = fV n(yjx)g and W n = fW n(y0jx)g,where V n(yjx) = ( jTnV (x)j�1; if y 2 TnV (x)0; otherwise (4.9)W n(y0jx) = ( jTnW (x)j�1; if y0 2 TnW (x)0; otherwisewill be referred to as combinatorial channels V n and W n: For these channelswe de�ne the decoding error probabilities for a codeword x 2 GnP :P (n)d (x; V n) = Xy V n(yjx) � �f d(x;y) � D(x;y) g (4.10)12

Page 13: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

P (n)d (x;W n) = Xy0 W n(y0jx) � �f d(x;y0) � D(x;y0) gthe maximal decoding error probabilities :P (n)d (V n) = maxx2GnP P (n)d (x; V n) (4.11)P (n)d (W n) = maxx2GnP P (n)d (x;W n)and the sequences fP (n)d (V n)g; fP (n)d (W n)g in the same way as for the chan-nels V and W in (3.3)-(3.5).Lemma 4.4 (Combinatorial Approximation Lemma): The sequences ofthe maximal decoding error probabilities for the memoryless channels V andW; de�ned in (3.1), and for the combinatorial channels V n and W n; de�nedin (4.9), approximate each other, i.e.,�(fP (n)d (V )g) = �(fP (n)d (V n)g) (4.12)�(fP (n)d (W )g) = �(fP (n)d (W n)g):4.2 Permutation LemmaThe results of the previous subsection are valid for any memoryless chan-nel. In particular, they are valid for a channel V; satisfying the restrictionsV 2 VnP (4.13)I(P; V ) � I(P;W )PV = PWd(P; V ) � d(P;W ):The idea of the proof of the theorem is to connect the maximal decodingerror probabilities for combinatorial channels V n and W n in such a way thatfP (n)d (W n)g approximates fP (n)d (V n)g; i.e.,�(fP (n)d (V n)g) � �(fP (n)d (W n)g):Then, based on the combinatorial approximation lemma, we can use this in-equality in the middle part of a logical chain whose �rst part is the statement�(fP (n)d (V )g) = �(fP (n)d (V n)g)13

Page 14: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

and the last part is the statement�(fP (n)d (W n)g) = �(fP (n)d (W )g):As a result, we obtain (3.7) and prove the theorem.Lemma 4.5 (Permutation Lemma): Let the probability distributions P 2Pn and W 2 VnP be given, and let GnP be a P -composition binary code. LetV be some probability distribution satisfying the restrictions (4.13). Thenthe sequence of the maximal decoding error probability for the combinato-rial channel W n approximates the sequence of the maximal decoding errorprobability for the combinatorial channel V n; i.e.,�(fP (n)d (V n)g) � �(fP (n)d (W n)g): (4.14)5 Proof of the Combinatorial Approxima-tion LemmaThe distortion (metric) function was introduced in Section 2 in such away that, for binary-input channels,d0(y) = 00 � d1(y) � dmax <1; for all y 2 Y:Without loss of generality, we assume thatd1(0) = 0: (5.1)Proposition 5.1: Let V n = fV n(yjx)g and V n = fV n(yjx)g be the com-binatorial channels constructed for the memoryless channels V = fV x(y)gand V = fV x(y)g such thatV 0(y) = V0(y) = V 0(y); for all y 2 Y (5.2)and V 1(y) = ( V1(y) + (jY j � 1)�n; if y = 0V1(y)� �n; if y 6= 0 (5.3)V 1(y) = ( V1(y)� (jY j � 1)�n; if y = 0V1(y) + �n; if y 6= 0:14

Page 15: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

Then the following statements :�(fP (n)d (V n)g) � �(fP (n)d (V )g) � �(fP (n)d (V n)g) (5.4)are valid, where P (n)d (V n) = maxx2GnP P (n)d (x; V n)P (n)d (V n) = maxx2GnP P (n)d (x; V n)and P (n)d (x; V n) = Xy V n(yjx) � �f d(x;y) � D(x;y) gP (n)d (x; V n) = Xy V n(yjx) � �f d(x;y) � D(x;y) g:Proof of the Combinatorial Approximation Lemma Based on Proposition5.1: Let us note that Proposition 5.1 also gives the inequalities :�(fP (n)d (V 0)g) � �(fP (n)d (V n)g) (5.5)�(fP (n)d (V n)g) � �(fP (n)d (V 00)g)where V 00(y) = V0(y) = V 000 (y); for all y 2 Yand V 0x(y) = ( V1(y) + 2(jY j � 1)�n; if y = 0V1(y)� 2�n; if y 6= 0V 00x (y) = ( V1(y)� 2(jY j � 1)�n; if y = 0V1(y) + 2�n; if y 6= 0:Using the converse statement to the coding theorem for the channels V 0, V ,and V 00 we obtain :�(fP (n)d (V 0)g) = �(fP (n)d (V )g) = �(fP (n)d (V 00)g): (5.6)Hence, (5.4)-(5.6) lead to the statement�(fP (n)d (V n)g) = �(fP (n)d (V )g) = �(fP (n)d (V n)g): (5.7)15

Page 16: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

However, as it is easy to see�(fP (n)d (V n)g) � �(fP (n)d (V n)g) � �(fP (n)d (V n)g) (5.8)and combining (5.7), (5.8) we complete the proof of the combinatorial ap-proximation lemma.Proof of Proposition 5.1: The idea of the proof is to construct referencesto the vectors y 2 TnV (x) and y 2 TnV (x) located from the received vectory 2 Tn[V ](x) at the minimal Hamming distance.Let us introduce the setsSnV (x;y) = f y 2 TnV (x) : dH(y;y) = miny02TnV (x) dH(y;y0) gSnV (x;y) = f y 2 TnV (x) : dH(y;y) = miny02TnV (x) dH(y;y0) g:Then, using (5.1), we note that�f d(x;y) � D(x;y) g� �f d(x;y) � D(x;y) g � (5.9)�f d(x;y) � D(x;y) gfor all y 2 SnV (x;y) and y 2 SnV (x;y): Let us also introduce the uniformdistributions on the sets SnV (x;y) and SnV (x;y) :RnV (yjx;y) = ( jSnV (x;y)j�1; if y 2 SnV (x;y)0; otherwise (5.10)RnV (yjx;y) = ( jSnV (x;y)j�1; if y 2 SnV (x;y)0; otherwise.Note that Xy2Tn[V ](x) V (yjx) �RnV (yjx;y) = V n(yjx): (5.11)To prove (5.11), we use the symmetry properties of the sets Tn[V ](x) andTnV (x) and conclude that the sum at the left hand side of (5.11) is the samefor all y 2 TnV (x) and that this sum is equal to zero if y 62 TnV (x): Therefore,16

Page 17: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

this sum gives the uniform distribution on TnV (x); which is V n(yjx): Using(5.9)-(5.11), we write :P (n)d (x; V ) � Xy2Tn[V ](x) V (yjx) � �f d(x;y) � D(x;y) g (5.12)= Xy2Tn[V ](x) V (yjx) �Xy RnV (yjx;y) � �f d(x;y) � D(x;y) g� Xy2Tn[V ](x) V (yjx) �Xy RnV (yjx;y) � �f d(x;y) � D(x;y) g= Xy V n(yjx) � �f d(x;y) � D(x;y) g= P (n)d (x; V n):Similar considerations lead to the inequality :P (n)d (x; V ) � n + P (n)d (x; V n): (5.13)Since the inequalities (5.12) and (5.13) are valid for all x 2 GnP ; we obtainP (n)d (V n) � P (n)d (V ) � n + P (n)d (V n)and prove (5.5). Q.E.D.6 Proof of the Permutation Lemma6.1 Basic Ideas of the ProofThe proof of the permutation lemma can be presented as a result ofseveral sequential steps, which are described in Subsections 6.2-6.6.We introduce a combinatorial broadcast channel fF n(y;y0jx)g, whichhas a codeword x 2 GnP at the input and the elements of the sets TnV (x) andTnW (x) at the output. The probabilities F n(y;y0jx) are assigned in such away that Xy0 F n(y;y0jx) = V n(yjx) (6.1)Xy F n(y;y0jx) = W n(y0jx)17

Page 18: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

and F n(y;y0jx) > 0 () dH(y;y0) = k (6.2)where k = miny02TnW (x) dH(y;y0); y 2 TnV (x): (6.3)If R > I(P; V ); then the decoder D; which receives y; cannot realize areliable decoding procedure, i.e., with the high probability, there exists anincorrect codeword x̂ such thatd(x̂;y) � d(x;y): (6.4)Using (6.1) and the inequality d(P; V ) � d(P;W ) we conclude thatd(x;y) � d(x;y0) (6.5)i.e., the conditions for the decoder D0; which receives y0; are worse than theconditions for the decoder D from the point of view of the correct codewordx: Our intention is to prove that the conditions for D0; as a rule, are not worsethan the conditions for D from the point of view of the incorrect codewordx̂; i.e., as a rule, d(x̂;y0) � d(x̂;y): (6.6)Then using (6.4)-(6.6) we conclude that, as a rule, d(x̂;y0) � d(x;y0); andD0 cannot do better than D:The result is named a 'permutation lemma' since y 2 TnV (x) and y0 2TnW (x) can be obtained one from the other by permutations of the compo-nents. This statement follows from the equation PV = PW; which is veryimportant for our considerations. The minimal number of pairwise permu-tations transforming y to some element of the set TnW (x) is equal to k=2;where k is de�ned in (6.3). In Subsection 6.2 we describe the structure ofthese permutations and note that there is a complementary property : forany given x 2 X; the set Y can be splitted into two disjoint subsets, Y +x andY �x ; such thatxj = x; yj 6= y0j =) yj 2 Y �x ; y0j 2 Y +x or yj 2 Y +x ; y0j 2 Y �x : (6.7)On the base of this fact we represent F n as a result of combination of fourconditional distributions : Qn; Un�k;�V k, and �W k in Subsection 6.3. The18

Page 19: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

distribution Un�k is used to assign the coinciding symbols in y and y0; andwe interprete it as a common element in V n and W n: The distributions �V kand �W k are used to assign the noncoinciding symbols in y and y0; and weinterprete them as individual contributions of V n and W n: A binary vectorz of length n and the Hamming weight k, which determines the positionsj such that yj 6= y0j; is assigned in accordance with the distribution Qn: Inthe end of Subsection 6.3 we represent the transformation of y 2 TnV (x) toy0 2 TnW (x) as interchanging of the individual contributions of V n and W n:Since d(P; V ) � d(P;W ); all the transformations of y 2 TnV (x) to y0 2TnW (x) do not decrease the value of the distortion function for the codewordx: If x̂ 2 GnPnfxg is an incorrect codeword, then a particular transformationeither increases or decreases the value of the distortion function for x̂: Theresult depends on the distributions of the components yj 6= y0j located atpositions where xj 6= x̂j: In Subsection 6.4 we show that there are distribu-tions which de�nitely do not increase this value. Therefore, we conclude thatd(x;y0) � d(x̂;y0) if d(x;y) � d(x̂;y) and call these transformations of y toy0 as permutations conserving relations between distortions.The last step of the proof is to show that the decoder D can restrict thecode GnP in such a way that the distribution on the components, where y istransformed to y0; is �xed. On the other hand, the restriction of the codekeeps the converse statement to the coding theorem for the channel V: Theseconsiderations are given in Subsection 6.5.A very short formal proof of the permutation lemma is given in Subsection6.6.We recommend reading Subsections 6.2-6.5 simultaneously with the ma-terial given in the Appendix, where the main steps of the proof are illustratednumerically.6.2 The Structure of Minimal Permutations BetweenTnV (x) and TnW (x)Let Y +x = f y 2 Y : Vx(y) > Wx(y) g (6.8)Y �x = f y 2 Y : Vx(y) < Wx(y) g19

Page 20: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

kx(y) = ( nPx � (Vx(y)�Wx(y)); if y 2 Y +x0; if y 2 Y �xk(y) = Xx kx(y)kx = Xy kx(y):For binary-input channels, the equation PV = PW means thatP0 � V0(y) + P1 � V1(y) = P0 �W0(y) + P1 �W1(y)for all y 2 Y: Therefore, as it is easy to see,Y �0 = Y +1 ; Y +0 = Y �1and k =Xy k(y) =Xx kxwhere the parameter k is de�ned in (6.3).Let us suppose that we want to get a vector y0 2 TnW (x) from y 2TnV (x) using a minimal number of permutations of the components (thesepermutations are referred to as minimal permutations). If y 2 Y +0 ; thenV0(y) > W0(y) , V1(y) < W1(y), andnP0 � V0(y)� nP0 �W0(y) = nP1 �W1(y)� nP1 � V1(y)because of the condition PV = PW . The similar statement is valid for ally 2 Y +1 : Therefore, we should select k0(y) indices j such that (xj; yj) = (0; y)for all y 2 Y +0 and k1(y) indices j such that (xj; yj) = (1; y) for all y 2 Y +1 :Then k0 and k1 indices j such that xj = 0 and xj = 1 will be selected, andk0 = k1: The components of the vector y located at k0 positions where xj = 0should be replaced with the components located at k1 positions where xj = 1and vice versa. As a result of this procedure, we obtain a vector y0 2 TnW (x)with the following properties:nxyy(x;y;y0) = nPx �minfVx(y);Wx(y)gXy0 6=y nxyy0(x;y;y0) = ( nPx � (Vx(y)�Wx(y)) if y 2 Y +xnPx � (Wx(y)� Vx(y)) if y 2 Y �xwhere nxyy0(x;y;y0) is the number of indices j such that xj = x; yj = y; andy0j = y0: 20

Page 21: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

6.3 Decomposition of the Distributions V n and W nConvention 6.1 (Z-Convention): For any binary vector z = (z1; :::; zn),we introduce the setsZ� = f j : zj = 0g ; Z = f j : zj = 1gand associate z with the pair (Z�; Z): For any vector u; we write : u =(uZ�;uZ); where uZ� is the vector composed of the components uj; j 2 Z�;and uZ is the vector composed of the components uj; j 2 Z (in further con-siderations we substitute x; y; and y0 for u).Let us de�ne the conditional probability distribution Q = fQx(z)g onf0; 1g in such a way thatQx(z) = ( (nPx � kx)=nPx; if z = 0kx=nPx; if z = 1for all x 2 X: Since Q is as a type on f0; 1gn given x 2 TnP , we can also referto the setTnQ(x) = f z 2 f0; 1gn : Comp(x; z) = f Px �Qx(z) g gand de�ne a combinatorial channel Qn = fQn(zjx)g as the uniform distribu-tion on TnQ(x); i.e.,Qn(zjx) = ( jTnQ(x)j�1; if z 2 TnQ(x)0; otherwise.We also de�ne the conditional probability distributions U = fUx(y)g; �V =f�Vx(y)g; and �W = f�Wx(y)g in such a way thatUx(y) = nPx �minfVx(y);Wx(y)gnPx � kx (6.9)�Vx(y) = nPxVx(y)� (nPx � kx) � Ux(y)kx�Wx(y) = nPxWx(y)� (nPx � kx) � Ux(y)kx21

Page 22: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

for all x 2 X; y 2 Y . Given x 2 TnP and z 2 TnQ(x); we refer to the setsTn�kU (xZ�) = f yZ� 2 Y n�k :Comp(xZ�;yZ�) = f (nPx � kx) � Ux(y)=(n� k) g gTk�V (xZ) = f yZ 2 Y k : Comp(xZ ;yZ) = f kx ��Vx(y)=k g gTk�W (xZ) = f y0Z 2 Y k : Comp(xZ ;y0Z) = f kx ��Wx(y)=k g gwhere we used the de�nition of Comp, given in Section 2 as the set of ratiosof the number of entries (x; y) in a particular pair of vectors and the lengthof these vectors.Let us introduce the combinatorial channelsUn�k = fUn�k(yZ�jxZ�)g�V k = f�V k(yZ jxZ)g;and �W k = f�W k(yZ jxZ)gas the uniform distributions on the setsTn�kU (xZ�), �V k(yZ jxZ), and �W k(y0Z jxZ),i.e., Un�k(yZ�jxZ�) = ( jTn�kU (xZ�)j�1; if yZ� 2 Tn�kU (xZ�)0; otherwise�V k(yZ jxZ) = ( jTk�V (xZ)j�1; if yZ 2 Tk�V (xZ)0; otherwise�W k(y0Z jxZ) = ( jTk�W (xZ)j�1; if y0Z 2 Tk�W (xZ)0; otherwise.Then we can represent the process of generating the vectors y 2 TnV (x) andy0 2 TnW (x); located at the Hamming distance k, as a transmission of x overa combinatorial broadcast channel (Fig.2), de�ned by the probabilitiesF n(y;y0jx) = Xz Qn(zjx) � fn�k(yZ�;y0Z�jx) ���V k(yZ jxZ) ��W k(y0Z jxZ)22

Page 23: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

where fn�k(yZ�;y0Z�jx) = Un�k(yZ�jxZ�) � �f y0Z� = yZ� g:These probabilities satisfy (6.1), (6.2), and the triples( yZ� = y0Z�;yZ ;y0Z ) 2 ( Tn�kU (xZ�);Tk�V (xZ);Tk�W (xZ) ) (6.10)are generated at the output of the channel. The decoder D receives (yZ�;yZ);while the decoder D0 receives (yZ�;y0Z) (note that z is unknown for D andD0). From the point of view of D, transmission of the codeword x over thecombinatorial channel V n can be represented as a process consisting of twosteps :1) the channel assigns a binary vector z 2 TnQ(x);2) the channel distributes n�k components of the vector y correspondingto the components 0 of the vector z in accordance with the distributionUn�k and k components of the vector y corresponding to the compo-nents 1 of the vector z in accordance with the distribution �V k:From the point of view of D0, transmission of the codeword x over thecombinatorial channel W n can be described in the same way, but the distri-bution �W k should be used at the second step instead of �V k:Note that the distributions �W and �V have the complementary prop-erty : �V0(y) = �W1(y) (6.11)�W0(y) = �V1(y)for all y 2 Y: Note also that the average distortion functions d(P; V ) andd(P;W ) can be written asd(P; V ) = nP1 � k1n �Xy U1(y) � d1(y) + k1n �Xy �V1(y) � d1(y)d(P;W ) = nP1 � k1n �Xy U1(y) � d1(y) + k1n �Xy �W1(y) � d1(y)where we used the equations d0(y) = 0 for all y 2 Y: Therefore, the restrictiond(P; V ) � d(P;W )can be represented as the inequality :Xy (�V1(y)��W1(y)) � d1(y) � 0: (6.12)23

Page 24: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

6.4 Permutations Conserving Relations Between Dis-tortions for Binary-Input ChannelsIn this subsection we show the idea for a simpli�ed case and then extendthe considerations to a more general case.Let z 2 TnQ(x); y 2 TnV (x); and y0 2 TnW (x) be given. Suppose also thatx̂ 2 GnPnfxg is �xed. Letkxx̂ = Xj:zj=1�f (xj; x̂j) = (x; x̂) g (6.13)kxx̂(y) = Xj:zj=1�f (xj; x̂j; yj) = (x; x̂; y) gkxx̂(y0) = Xj:zj=1�f (xj; x̂j; y0j) = (x; x̂; y0) gdenote the number of entries (x; x̂) in the pair (xZ; x̂Z); the number of entries(x; x̂; y) in the triple (xZ ; x̂Z;yZ); and the number of entries (x; x̂; y0) in thetriple (xZ ; x̂Z;y0Z); respectively.Proposition 6.2: Let y 2 TnV (x) and y0 2 TnW (x) be connected by (6.10).If x̂ 2 GnP and z 2 TnQ(x) are chosen in such a way thatkxx̂(y) = kxx̂ ��Vx(y); for all y 2 Y (6.14)kxx̂(y0) = kxx̂ ��Wx(y0); for all y0 2 Ythen d(x̂;y) � d(x;y) =) d(x̂;y0) � d(x;y0):Proof: Let � = (d(x;y0)� d(x̂;y0))� (d(x;y)� d(x̂;y)): (6.15)Since d(x;y)d(x;y0) ) = d(xZ�;yZ�) + ( d(xZ;yZ)d(xZ;y0Z)d(x̂;y)d(x̂;y0) ) = d(x̂Z�;yZ�) + ( d(x̂Z;yZ)d(x̂Z;y0Z)we change the order of summands in (6.15) and write� = (d(xZ;y0Z)� d(xZ;yZ))� (d(x̂Z ;y0Z)� d(x̂Z;yZ)):24

Page 25: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

Thus� = 0@Xy0 k10(y0) � d1(y0)�Xy k10(y) � d1(y)1A� (6.16)� 0@Xy0 k01(y0) � d1(y0)�Xy k01(y) � d1(y)1A= k10 �Xy (�W1(y)��V1(y)) � d1(y)� k01 �Xy (�W0(y)��V0(y)) � d1(y)= (k10 + k01) �Xy (�W1(y)��V1(y)) � d1(y) � 0:The �rst equation in (6.16) follows from the note that we can consider onlynoncoinciding components of x and x̂: The second equation follows from(6.14). Then we have used (6.11) and (6.12). Hence,d(x;y0)� d(x̂;y0) = d(x;y)� d(x̂;y) + � � 0:Q.E.D.The statement below generalizes the considerations. The proof is omittedsince it is similar to the proof of Proposition 6.2.Proposition 6.3: Let y 2 TnV (x) and y0 2 TnW (x) be connected by (6.10).If x̂ 2 GnP and z 2 TnQ(x) are chosen in such a way that (6.17)jkxx̂(y)� kxx̂ ��Vx(y)j � �n � kxx̂ for all (x; x̂; y) 2 X �X � Yjkxx̂(y0)� kxx̂ ��Wx(y0)j � �n � kxx̂ for all (x; x̂; y0) 2 X �X � Ywhere kxx̂; kxx̂(y); and kxx̂(y0) are de�ned in (6.13), andXy (�W1(y)��V1(y)) � d1(y) � 2�ndmax (6.18)then d(x̂;y) � d(x;y) =) d(x̂;y0) � d(x;y0):25

Page 26: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

6.5 Restrictions of the Code GnPIn this subsection, we are dealing with the probabilistic ensemblef XY Z; Px � Vx(y) � Q̂(zjx; y) gwhere Z = f0; 1g;Q̂x;y(z) = ( 1� kx(y)=(nPx � Vx(y)); if z = 0kx(y)=(nPx � Vx(y)); if z = 1 (6.19)and the parameters kx(y); x 2 X; y 2 Y; are de�ned in (6.8). We consider thedistribution Q̂ = fQ̂x;y(z)g as a conditional type on f0; 1gn given y 2 TnV (x)and x 2 TnP and introduce the setTn̂Q(x;y) = f z 2 f0; 1gn : nxyz(x;y; z) = nPx � Vx(y) � Q̂x;y(z)for all (x; y; z) 2 X � Y � Z gwhere nxyz(x;y; z) denotes the number of entries (x; y; z) in the triple (x;y; z):Let the uniform distributions on Tn̂Q(x;y) be given asQ̂n(zjx;y) = ( jTn̂Q(x;y)j�1; if z 2 Tn̂Q(x;y)0; otherwise:Let us �x x̂ 2 GnP ; y 2 TnV (x); and z 2 Tn̂Q(x;y): In Subsection 6.2we partitioned the output alphabet Y in two disjoint subsets, Y +0 and Y +1(see (6.8)), and we can say either y 2 Y +0 or y 2 Y +1 for any y 2 Y: Letx(y) 2 f0; 1g be de�ned in such a way thatx(y) = x () y 2 Y +x (6.20)and let kxx̂(y) = Xj:zj=1�f (x̂j; yj) = (x̂; y) g; x = x(y): (6.21)Besides, let kxx̂ = Xy2Y +x Xj:zj=1�f (x̂j; yj) = (x̂; y) g: (6.22)If kxx̂ > 0; then fkxx̂(y)=kxx̂; y 2 Y +x g are the probability distributions onY +x for all pairs (x; x̂) 2 X �X: In the de�nition below, we construct a sub-code of GnP , consisting of codewords, for which these distributions are close26

Page 27: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

to f�Vx(y); y 2 Y g for all x̂ 2 X:De�nition 6.4: The code GnP (yZ) will be referred to as a �V -restrictedcode given y 2 TnV (x) and z 2 Tn̂Q(x;y) if it consists of the codewordsx̂ 2 GnP such thatjkxx̂(y)� kxx̂ ��Vx(y)j � �n � kxx̂ for all (x; x̂; y) 2 X �X � Y (6.23)where the parameters kxx̂(y) and kxx̂ are de�ned in (6.21), (6.22).Proposition 6.5: LetP̂ (n)d (x; V n) = Xy;z V n(yjx) � Q̂n(zjx;y) � �f d(x;y) � D(x;yjz) gwhere D(x;yjz) = minx̂2GnP (yZ)nfxg d(x̂;y): (6.24)Then there exist sequences f�ng such that�n ! 0; �npn!1; as n!1 (6.25)and that the sequences fP (n)d (V n)g and fP̂ (n)d (V n)g approximate each other,i.e., �(fP (n)d (V n)g) = �(fP̂ (n)d (V n)g): (6.26)Proof: First of all, we write :�(fP̂ (n)d (V n)g) � �(fP (n)d (V n)g) (6.27)since x 2 GnP (yZ) for all y 2 TnV (x) and z 2 Tn̂Q(x;y):Using (6.8), (6.19), and (6.20) we note that Q̂xy(1) > 0 only if x = x(y):Therefore, we can introduce the following conditional probability distributionon f0; 1g : ~Qy(z) = Q̂x;y(z); where x = x(y):The distribution ~Q = f ~Qy(z)g can be also considered as a conditional typeon f0; 1gn given y 2 TnPV ; i.e., we refer to the setTn~Q(y) = f z 2 f0; 1gn : Comp(y; z) = fPV (y) � ~Qy(z)g g:27

Page 28: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

Let the uniform distributions on Tn~Q(y) be given as~Qn(zjy) = ( jTn~Q(y)j�1; if z 2 Tn~Q(y)0; otherwise:Let us �x �V -restricted codes for all z 2 Tn~Q(y) in the same way as forthe vectors z 2 Tn̂Q(x;y) and de�ne the decoding error probability for thecodeword x as~P (n)d (x; V n) = Xy;z V n(yjx) � ~Qn(zjy) � �f d(x;y) � D(x;yjz) gwhere the function D(x;yjz) is de�ned in (6.24). Note that~P (n)d (x; V n) = Xy;z V n(yjx) � ~Qn(zjy) � �f x 62 GnP (yZ) g (6.28)+ Xy;z V n(yjx) � ~Qn(zjy) � �f x 2 GnP (yZ) g ���f d(x;y) � D(x;yjz) gand P̂ (n)d (x; V n) � Xy;z V n(yjx) � ~Qn(zjy) � �f x 2 GnP (yZ) g � (6.29)��f d(x;y) � D(x;yjz) gsince the vectors z 2 Tn~Q(y) put stronger restrictions on the incorrect code-words included into the collection GnP (yZ) for all y 2 TnV (x) compared tothe vectors z 2 Tn̂Q(x;y): However, using the same arguments as in Lemma4.1, we conclude that there exist sequences f�ng satisfying (6.25) such thatXy V n(yjx) � ~Qn(zjy) � �f x 2 GnP (yZ) g � 1� �n (6.30)where f�ng is a sequence, depending on f�ng; such that �n ! 0; as n!1:Thus, combining (6.28)-(6.30), we conclude that�(f ~P (n)d (V n)g) � �(fP̂ (n)d (V n)g): (6.31)28

Page 29: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

The condition R > I(P; V ) does not give an opportunity to partition thespace consisting of all possible pairs (y; z), where y 2 TnPV and z 2 Tn̂Q(y).Therefore, �(fP (n)d (V n)g) � �(f ~P (n)d (V n)g): (6.32)As a result of (6.31) and (6.32), we have�(fP (n)d (V n)g) � �(fP̂ (n)d (V n)g) (6.33)and complete the proof using (6.27) and (6.33). Q.E.D.Note that x 2 GnP (yZ) and that if x̂ 2 GnP (yZ); then the �rst group ofinequalities in (6.17) is satis�ed.The similar considerations are valid for the decoder D0, which receives avector y0 2 TnW (x): To avoid more complex notations, we distinguish betweenthe parameters of D and D0 writing y0 and y0 instead of y and y; respectively.Let Q̂x;y0(z) = ( 1� kx(y0)=(nPx �Wx(y0)); if z = 0kx(y0)=(nPx �Wx(y0)); if z = 1Tn̂Q(x;y0) = f z 2 f0; 1gn : nxy0z(x;y0; z) = nPx �Wx(y0) � Q̂x;y0(z)for all (x; y0; z) 2 X � Y � Z gQ̂n(zjx;y0) = ( jTn̂Q(x;y0)j�1; if z 2 Tn̂Q(x;y0)0; otherwise:De�nition 6.6: The code GnP (y0Z) will be referred to as a �W -restrictedcode given y0 2 TnPW and z 2 Tn̂Q(x;y0) if it consists of the codewords x̂ 2 GnPsuch thatjkxx̂(y0)� kxx̂ ��Wx(y0)j � �n � kxx̂ for all (x; x̂; y0) 2 X �X � Y (6.34)where kxx̂(y0) = Xj:zj=1�f (x̂j; y0j) = (x̂; y0) g; x = x(y0)kxx̂ = Xy02Y �x Xj:zj=1�f (x̂j; y0j) = (x̂; y0) g29

Page 30: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

and x(y0) = x () y0 2 Y �x :Proposition 6.7: LetP̂ (n)d (x;W n) = Xy0;zW n(y0jx) � Q̂n(zjx;y0) � �f d(x;y0) � D(x;y0jz) gwhere D(x;y0jz) = minx̂2GnP (y0Z)nfxg d(x̂;y0):Then there exist sequences f�ng such that�n ! 0; �npn!1; as n!1 (6.35)and that the sequence fP̂ (n)d (W n)g approximates the sequence fP̂ (n)d (V n)g;i.e., �(fP̂ (n)d (V n)g) � �(fP̂ (n)d (W n)g): (6.36)Proof: Let us suppose that d(x;y) � D(x;yjz) for some y 2 TnV (x)and z 2 Tn̂Q(x;y). Then there exists an incorrect codeword x̂ such thatx̂ 2 GnP (yZ) and d(x̂;y) � d(x;y): (6.37)Let the parameters kxx̂, de�ned in (6.22), correspond to this codeword. Wegenerate y0 2 TnW (x) such that (6.10) is valid, i.e., we set y0Z� = yZ� anddistribute the components of the vector y0Z 2 Tk�W (xZ): It means that wedistribute k0 symbols y 2 Y �0 on k00 and k01 positions j where xj = 0; andk1 symbols y 2 Y �1 on k10 and k11 positions j where xj = 1: However, thesepositions are �xed and, typically, we obtain the same conditional distributionon kx0 and kx1 positions as a result of this procedure (note that the maindi�culty of the previous analysis was the point that there are exponentiallymany incorrect codewords, and we could not �x one of them). Therefore, weneed to assign �n in such a way that the distributions, which do not satisfy(6.34), will be classi�ed as large deviations. We refer to the arguments leading30

Page 31: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

to the Delta-convention again [2, Convention 2.11] and conclude that we cantake a sequence f�ng; satisfying (6.35). Then we note that the conditions(6.23) and (6.34) coincide with (6.17) for the codeword x̂ satisfying (6.37).We suppose that d(P; V ) � d(P;W )� 2�ndmax (6.38)and use the result of Proposition 6.3. The formal steps are given below.P̂ (n)d (x; V n) = Xy;z V n(yjx) � Q̂n(zjx;y) ���f 9x̂ : x̂ 2 GnP (yZ) and d(x;y) � d(x̂;y) g! Xy;z V n(yjx) � Q̂n(zjx;y) �Xy0Z �W k(y0Z) ���f 9x̂ : x̂ 2 GnP (yZ); x̂ 2 GnP (y0Z); and d(x;y) � d(x̂;y) g� Xy;z V n(yjx) � Q̂n(zjx;y) �Xy0Z �W k(y0Z) ���f 9x̂ : x̂ 2 GnP (y0Z) and d(x;y0) � d(x̂;y0) g= Xy;z V n(yjx) � Q̂n(zjx;y) �Xy0Z �W k(y0Z) � �f d(x;y0) � D(x;y0jz) g= Xy0;zW n(y0jx) � Q̂n(zjx;y0) � �f d(x;y0) � D(x;y0jz) g= P̂ (n)d (x;W n)where we also used the equationsV n(yjx) � Q̂n(zjx;y) ��W k(y0Z) = Qn(zjx) � Un�k(yZ�) ��V k(yZ) ��W k(y0Z)= W n(y0jx) � Q̂n(zjx;y0) ��V k(yZ):Finally, we note that the sequence f�ng vanishes with n and we can set�n = 0 in the equation (6.38), when we formulate the result using the upsilonfunction. Q.E.D.31

Page 32: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

6.6 Formal Proof of the Permutation LemmaThe upsilon notation allows us to write all the proof of the permutationlemma in one line :�(fP (n)d (V n)g) = �(fP̂ (n)d (V n)g) � �(fP̂ (n)d (W n)g) = �(fP (n)d (W n)g)where we have used (6.26) and (6.36), i.e., (4.14) is valid. The proof of thetheorem is obtained when we also write : �(fP (n)d (V )g) = as the leftmostterm of this line, and = �(fP (n)d (W )g) as the rightmost term.7 AcknowledgementThe author wishes to thank Prof. R.Johannesson for the possibilityto work at the mismatched problem in his Department and for the help inpreparing the manuscript. The author is also greatful to Prof. R.Ahlswede,Prof. N.Cai, Prof. I.Csisz�ar, Prof. G.Kaplan, Prof. A.Lapidoth, Prof.N.Merhav, Prof. P.Narayan, and Prof. S.Shamai (Shitz) for their interest tothe results and fruitful discussions.

32

Page 33: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

AppendixAn Example of Mismatched DecodingLet Y = f0; 1; 2; 3g; P0 = P1 = 1=2:The marginal probabilities fPW (y)g and the values of the distortion func-tion fdx(y)g are de�ned in Table 1 by the vector PW and the matrix d;respectively.Note that the minimization of the distortion function d; when a �xed P -composition code is used, is equivalent to the maximum likelihood decodingfor the memoryless channel de�ned by the following matrix of transitionprobabilities : W0 = " 1=2 1=4 1=8 1=81=8 1=8 1=4 1=2 # :Really � log2W0 = " 1 2 3 33 3 2 1 #and the minimization of the distortion functiond0x(y) = � log2W0;x(y) (A.1)is equivalent to the minimization of the functiondx(y) = c � (d0x(y) + fx + '(y)) (A.2)for any c > 0; ffxg and f'(y)g: Ifc = 1f0 = 0; f1 = 2'(y) = log2W0;0(y); for y = 0; 1; 2; 3then the values of the distortion function, de�ned by (A.1), (A.2), coincidewith the values given in the matrix d.The matrix W given in Table 1 was obtained by changing the �rst twocolumns of W0: It does not a�ect the marginal probability distribution onY; but increases the average distortiond(P;W0) = 0:5 � (4 � 1=8 + 3 � 1=8 + 1 � 1=4 + 0 � 1=2) = 0:5625d(P;W ) = 0:5 � (4 � 1=4 + 3 � 0=8 + 1 � 1=4 + 0 � 1=2) = 0:625:33

Page 34: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

Therefore, V =W0 satis�es the restrictionsPV = PWd(P; V ) � d(P;W )and belongs to the minimization domain in (3.9).The transition probabilities V = fVx(y)g; given in Table 1 by the matrixV; satisfy the same restrictions, sincePV (0) = PV (3) = 0:5 � (0:474 + 0:151) = 0:3125PV (1) = PV (2) = 0:5 � (0:240 + 0:135) = 0:1875d(P; V ) = 0:5 � (4 � 0:151 + 3 � 0:135 + 1 � 0:240 + 0 � 0:474) = 0:6245:However H(V jP ) = 1:807 > H(W0jP ) = 1:213:Thus in comparison with W0, the distribution V gives stronger restrictionson the maximal transmission rate expressed as follows :Cd(P;W ) = H(PW )�H(V jP ):It is easy to see that maximum likelihood decoding for the memorylesschannel V is also equivalent to the minimization of the distortion functiond = fdx(y)g: Really� lnV = " 1:077 2:059 2:889 2:7272:727 2:889 2:059 1:077 # :and if c = 1:21f0 = 0; f1 = 1:65;'(y) = lnV0(y) for y = 0; 1; 2; 3then dx(y) = c � (� lnVx(y) + fx + '(y)):34

Page 35: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

Let us suppose that n = 2000 and that x = 0100011000 is the transmittedcodeword (we denote by si the sequence consisting of i repeatitions of thesymbol s). Theny = 04741240213531510151113522403474 2 TnV (x)and all the other vectors of the set TnV (x) are obtained by all permutationsof the components yj; j 2 J0; and all permutations of the components yj;j 2 J1, where we denote the set of the �rst 1000 positions by J0 and the setof the last 1000 positions by J1: If we want to get a vector y0 2 TnW (x) fromy using the minimal number of pairwise permutations of the components,we have to select 99, 10, and 26 positions j 2 J0 where yj = 0, 2, and3, respectively, and 135 positions j 2 J1 where yj = 1: The correspondingcomponents should be interchanged. This procedure is illustrated in Table2. We can represent the transmission of the codeword x over the channelV n as follows :1) the channel assigns 135 positions j 2 J0 and 135 positions j 2 J1;2) the channel distributes 99, 10, and 26 symbols 0, 2, and 3 on theselected positions of J0 and generates 135 symbols 1 on the selectedpositions of J1;3) the channel distributes 474 � 99; 240; 135� 10; and 151� 26 symbols0, 1, 2, and 3 on the remaining 865 positions of J0 and 151, 240, and471 symbols 0, 2, and 3 on the remaining 865 positions of J1:The distributions of the step 2) are given by �V, and the distributions ofthe step 3) are given by U (Table 1). The vectors y0 2 TnW (x); located fromy at the Hamming distance k = 270; will be obatined if we substitute (J1; J0)for (J0; J1) to this procedure. Then the distribution of the step 2) will givenby �W.We are interested in the caseenR > 2000!625! � 385! � 385! � 625! � 1000!474! � 135! � 240! � 151!!�235

Page 36: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

and suppose that the decoder D; having received one ofjTnV (x)j = 1000!474! � 135! � 240! � 151!!2vectors y 2 TnV (x); can assign one ofjTn̂Q(x;y)j = 47499 ! � 135135 ! � 24010 ! � 15126 !binary vectors z 2 Tn̂Q(x;y) of the Hamming weight 270 such that there are99, 135, 10, and 26 positions j containing zj = 1 and (xj; yj) = (0,0), (1,1),(0,2), and (0,3), respectively. Let x̂ 2 GnPnfxg denote an incorrect codeword,and k00(y) and k01(y) denote the number of indices j such that (x̂j; yj) =(0; y) and (x̂j; yj) = (1; y); respectively, for y = 0; 2; 3: Furthermore, letk00 = k00(0) + k00(2) + k00(3)k01 = k01(0) + k01(2) + k01(3):Note that we write the �rst index 0 because, in our case, Y +0 = f0; 2; 3g andx(y) = 0 for y = 0, 2, and 3. Then the codeword x̂ belongs to a collection,which was called a �V -restricted code GnP (yZ), ifjk00(0)� k00 � 99=135j < �n � k00jk00(2)� k00 � 10=135j < �n � k00jk00(3)� k00 � 26=135j < �n � k00and jk01(0)� k01 � 99=135j < �n � k01jk01(2)� k01 � 10=135j < �n � k01jk01(3)� k01 � 26=135j < �n � k01:We examine the characteristics of the decoding when the decoder selects thecodeword with the minimal distortion, which belongs to GnP (yZ); and takesthe average value of the decoding error probability for the codeword x overall possible vectors z 2 Tn̂Q(x;y): Note that, in this procedure, the decoderhas some helping information since the vectors z are assigned depending on36

Page 37: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

x: However, such a restriction of the code GnP is equivalent to the procedurewhen the decoder restricts the code using uniformly distributed vectors ~zhaving the Hamming weight99 � 625=474 + 135 � 385=135 + 10 � 385=240 + 26 � 151=625 � 639such that there are 99 � 625=474, 135 � 385=135, 10 � 385=240, and 26 � 151=625positions j containing ~zj = 1 and yj = 0, 1, 2, and 3, respectively. An as-signement of the vector ~z does not depend on the transmitted codeword, andthe possibility of a reliable decoding procedure with the 'helping' informationwill lead to a possibility of an improvement of the behaviour, which is basedon the received vector y:The similar considerations are also valid for the decoder D0; which receivesa vector y0 2 TnW (x) and 'constructs' �W -restricted codes. The decodingerror probability for D0 is not less than for D when they use restricted codes,and this note gives a necessary connection between the behaviour of D andD0; which allows us to lower-bound the decoding error probability for D0:

Page 38: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

Table 1: The matrices of transition probabilities, which determine the char-acteristics of mismatched decoding for the channel W with respect to thedistortion function d:y = 0 1 2 3PW = ( 1/2 + 1/8 )/2 = ( 1/4 + 1/8 )/2 = ( 1/8 + 1/4 )/2 = ( 1/8 + 1/2 )/2 == 0.3125 = 0.1875 = 0.1875 = 0.3125d = 0 0 0 04 3 1 0W0 = 1/2 1/4 1/8 1/81/8 1/8 1/4 1/2W = 1/2 - 1/8 = 1/4 + 1/8 = 1/8 = 1/8 == 0.375 = 0.375 = 0.125 = 0.1251/8 + 1/8 = 1/8 - 1/8 = 1/4 = 1/2 == 0.250 = 0.000 = 0.250 = 0.500V = 0.474 0.240 0.135 0.1510.151 0.135 0.240 0.4740.375/ 0.865 = 0.240/ 0.865 = 0.125/ 0.865 = 0.125/ 0.865 =U = = 0.433 = 0.277 = 0.145 = 0.1450.151/ 0.865 = 0.000/ 0.865 = 0.240/ 0.865 = 0.474/ 0.865 == 0.175 = 0.000 = 0.277 = 0.5480.099/ 0.135 = 0.000/ 0.135 = 0.010/ 0.135 = 0.026/ 0.135 =�V = = 0.733 = 0.000 = 0.074 = 0.1930.000/ 0.135 = 0.135/ 0.135 = 0.000/ 0.135 = 0.000/ 0.135 == 0.000 = 1.000 = 0.000 = 0.0000.000/ 0.135 = 0.135/ 0.135 = 0.000/ 0.135 = 0.000/ 0.135 =�W = = 0.000 = 1.000 = 0.000 = 0.0000.099/ 0.135 = 0.000/ 0.135 = 0.010/ 0.135 = 0.026/ 0.135 == 0.733 = 0.000 = 0.074 = 0.193

Page 39: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

Table 2: The structure of a vector y 2 T2000V (x); which is transformed intoa vector y0 2 T2000W (x); the probability distributions V and W are given inTable 1 and x = 0100011000: x = 0 x = 1Y +x = f0; 2; 3g f1gY �x = f1g f0; 2; 3gnx0(x;y) = 375 + 99 151 + 0nx1(x;y) = 240 + 0 0 + 135nx2(x;y) = 125 + 10 240 + 0nx3(x;y) = 125 + 26 474 + 0kx(0) = 99 0kx(1) = 0 135kx(2) = 10 0kx(3) = 26 0kx = 135 135k(0) = 99k(1) = 135k(2) = 10k(3) = 26k = 27039

Page 40: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

ChannelV

ChannelW

DecoderD

DecoderD0u

-

-

y -

y0 -x 2 GnP

- x̂ 2 GnP

- x̂0 2 GnPFigure 1: A system model of information transmission for mismatched de-coding.

40

Page 41: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

Un�k ��V k ��W kQn

DecoderD

DecoderD0u

-6z 2 TnQ(x)

u6 yZ� 2 Tn�kU (xZ�)?-yZ 2 Tk�V (xZ)-y0Z 2 Tk�W (xZ)-x 2 GnP

- x̂ 2 GnP

- x̂0 2 GnPFigure 2: A broadcast model for mismatched decoding.

41

Page 42: pdfs.semanticscholar.org€¦ · A Con v erse Co ding Theorem for Mismatc hed Deco ding at the Output of Binary-Input Memoryless Channels V.B.Balakirsky The author is from Data Securit

References[1] C.E.Shannon, "A mathematical theory of communication," Bell Syst.Techn. J., vol.27, pp.379-423 and 623-656, July and Oct. 1948.[2] I.Csisz�ar and J.K�orner, Information Theory: Coding Theorems for Dis-crete Memoryless Systems. New York: Academic Press, 1981.[3] J.Y.N.Hui, Fundamental issues of multiple accessing. Ph.D dissertation,MIT, 1983.[4] I.Csisz�ar and J.K�orner, "Graph decomposition : A new key to codingtheorems," IEEE Trans. Inform. Theory, vol.IT-27, pp.5-12, Jan. 1981.[5] V.B.Balakirsky, "Coding theorems for discrete memoryless channels withgiven decision rule," Lecture Notes on Computer Science, No.573, Pro-ceedings of 1st French-Soviet Workshop on Algebraic Coding, July 1991,pp.142-150.[6] A.Lapidoth, "Information rates for mismatched decoders," in Proc. 2-ndWinter Meeting on Coding and Information Theory (Essen, Germany,Dec. 1993), p.12-15.[7] I.Csisz�ar and P.Narayan, "Channel capacity for a given decoding rule,"in Proc. IEEE Int.Symp. on Information Theory (Trondheim, Norway,June-July 1994), p.378.[8] A.Lapidoth, "Mismatched decoding and the multiple access channel,"in Proc. IEEE Int.Symp. on Information Theory (Trondheim, Norway,June-July 1994), p.382.[9] I.Csisz�ar and P.Narayan, "Channel capacity for a given decoding rule,"IEEE Trans. Inform. Theory. vol.IT-41, pp.35-43, Jan. 1995.[10] N.Merhav, G.Kaplan, A.Lapitoth, and S.Shamai(Shitz), "On infor-mation rates for mismatched decoders," IEEE Trans. Inform. Theory,vol.IT-40, pp.1953-1967, Nov. 1994.42