BER and the Hamming Codes 1 MAP and ML Decision Rules

21
BER and the Hamming Codes 1 MAP and ML Decision Rules Throughout these notes, we shall stick entirely to binary linear block codes. Thus we shall be dealing with vector spaces over the binary field F. The following is deceptively simple. Definition 1.1. By a length n, k bit (binary) linear block code we mean a k-dimensional subspace C V , where V is an n-dimensional vector space over F. Sometimes we shall refer to C as an (n, k) linear block code. The idea behind codes is that we wish to transmit k-bit messages across a noisy channel; to do this with some enhanced reliability, we build some measure of re- dundancy into the code. Thus the greater n exceeds k, the greater is the degree of redundancy. Of course, this redundancy is at the expense of efficiency of the code, which is defined as the ratio Eff(C )= dim C dim V = k n . Later on, we’ll look more closely at the negative effect of too much redundancy on the performance of certain codes. Definition 1.2. The binary symmetric channel (BSC) with crossover prob- ability p takes a single binary input x (from the binary field F) and switches it to x +1 with probability p. Throughout these notes, we shall assume that the BSC has crossover probability p< 1/2. The BSC is typically viewed according to the following picture: 1 0 p p 1 0 1 - p 1 - p

Transcript of BER and the Hamming Codes 1 MAP and ML Decision Rules

Page 1: BER and the Hamming Codes 1 MAP and ML Decision Rules

BER and the Hamming Codes

1 MAP and ML Decision Rules

Throughout these notes, we shall stick entirely to binary linear block codes. Thus weshall be dealing with vector spaces over the binary field F.

The following is deceptively simple.

Definition 1.1. By a length n, k bit (binary) linear block code we mean ak-dimensional subspace C ⊆ V , where V is an n-dimensional vector space over F.Sometimes we shall refer to C as an (n, k) linear block code.

The idea behind codes is that we wish to transmit k-bit messages across a noisychannel; to do this with some enhanced reliability, we build some measure of re-dundancy into the code. Thus the greater n exceeds k, the greater is the degree ofredundancy. Of course, this redundancy is at the expense of efficiency of the code,which is defined as the ratio

Eff(C) =dim C

dim V=

k

n.

Later on, we’ll look more closely at the negative effect of too much redundancy onthe performance of certain codes.

Definition 1.2. The binary symmetric channel (BSC) with crossover prob-ability p takes a single binary input x (from the binary field F) and switches it tox + 1 with probability p.

Throughout these notes, we shall assume that the BSC has crossover probabilityp < 1/2.

The BSC is typically viewed according to the following picture:

•1

•0

p

p

• 1

• 0

-

-

��

��

����@

@@

@@

@@R

1− p

1− p

Page 2: BER and the Hamming Codes 1 MAP and ML Decision Rules

1 MAP AND ML DECISION RULES 2

A fruitful way to view the BSC is that it takes an input vector from the code(a “codeword”), say c ∈ C, and adds a random noise vector e ∈ V to c, producingthe output vector c = c + e. This random vector has a distribution dictated by thechannel: if e0 = (ε1, ε2, . . . , εn) ∈ V , and if e0 has l components not equal to zero,then

Prob(e = e0) = pl(1− p)n−l.

Now assume that V = Fn = {(a1, a2, . . . , an) | ai ∈ F}, and let C ⊆ V be a code.An important parameter associated with the code C is its weight (or minimalweight), wt(C). First of all, if v = (a1, a2, . . . , an) ∈ V , set wt(v) = |{i | ai 6= 0}|.Now set

wt(C) = min0 6=c∈C

wt(c).

As a result, if our code C has minimal weight wt(C) = m, then we will be ableto detect any erroneously received word that differs in fewer than m coordinates fromsome codeword in C. Put differently, if the codeword c is sent and received as c = c+e,then when 0 < wt(e) ≤ m − 1 we will know that there is an error in the receivedcodeword c, at which point we might ask for a retransmission.

From the above discussion, we see that for a code C of minimal weight m,

Prob(undetected error) =n∑

l=m

(n

l

)pl(1− p)n−l =

(n

m

)pm + higher degree terms.

However, codes are useful not just for error detection; they can also be usefulfor error correction. However, for this to work, we need a mechanism for taking areceived vector v ∈ V and deciding which codevector was actually sent. In otherwords, a decision rule is really just a mapping

d : V → C.

At this point, we introduce two commonly applied criteria for decision rules. First,however, it’s reasonable to assume that we’re trying to minimize the error in makingour “decision” each time a vector is received, i.e., when we receive the vector v wewant to minimize

Prob(decision error | v is received).

Therefore, our unconditional decision error probability is

Prob(decision error) =∑v∈V

Prob(decision error | v is received)Prob(v is received).

If d : V → C is our decision rule, then

Prob(decision error | v is received) = 1− Prob(d(v) is sent | v is received),

Page 3: BER and the Hamming Codes 1 MAP and ML Decision Rules

1 MAP AND ML DECISION RULES 3

which gives us the error probability

Prob(decision error) = 1−∑v∈V

Prob(d(v) is sent | v is received)Prob(v is received).

This error probability is a minimum precisely when each conditional probabilityProb(d(v) is sent | v is received) is a maximum. Note, however, that this conditionalprobability is dependent on the input distribution, i.e., on the individual probabilitiesProb(c is sent), c ∈ C:

Prob(d(v) is sent | v is received) =Prob(v is received | d(v) is sent)Prob(d(v) is sent)

Prob(v is received)

=Prob(v is received | d(v) is sent)Prob(d(v) is sent)∑

c∈C

Prob(v is received| c is sent)Prob(c is sent)

where we have used Bayes’ Theorem for inverting the conditional probability.

Definition 1.3 (The Maximum A Posteriori Rule). The decision rule d : V →C is called a maximum a posteriori rule, or MAP rule for short, if it maximizeseach conditional probability

Prob(d(v) is sent | v is received), v ∈ V.

Part of the difficulty in constructing MAP decision rules is that they are basedon “reverse probabilities,” whose calculations involve Bayes’ Theorem. As such theyare dependent on the input distributions. We consider a couple of simple, but tellingexamples.

Example 1. Consider the simplest possible example of a code, viz., take C = F1 ={0, 1} = V . Assume that the BSC crossover probability is p and that the inputdistribution is given by Prob(0) = ε, Prob(1) = 1− ε. Therefore, we have that

Prob(0 is sent | 0 is received ) =Prob(0 is received | 0 is sent)Prob(0 is sent)

Prob(0 is received )

=Prob(0 is received | 0 is sent)Prob(0 is sent)1∑

i=0

Prob(0 is received | i is sent)Prob(i is sent)

=(1− p)ε

(1− p)ε + p(1− ε).

Page 4: BER and the Hamming Codes 1 MAP and ML Decision Rules

1 MAP AND ML DECISION RULES 4

Similarly, one computes

Prob(1 is sent | 0 is received ) =p(1− ε)

(1− p)ε + p(1− ε)

Prob(0 is sent | 1 is received ) =pε

pε + (1− p)(1− ε)

Prob(1 is sent | 1 is received ) =(1− p)(1− ε)

pε + (1− p)(1− ε)

Therefore, if 0 is received, what does an MAP decision rule tell us to “decide”as to what was sent? From the above it is evident that if (1 − p)ε ≤ p(1 − ε),which is equivalent to saying that ε ≤ p, then we should decide that 1 was sent.Note that this same condition implies (since p < 1/2) that pε ≤ (1 − p)(1 − ε)which means that if 1 is received, then we should also decide that 1 was sent.That is to say, if ε ≤ p, then we should always decide that 1 was sent! We leavethe other cases to the reader to work out.

Example 2. This time consider the one-dimensional code C = {(0, 0, 0), (1, 1, 1)} ⊆V = F3 = {(a1, a2, a3) | ai ∈ F}. As above, assume a crossover probability of p,and assume an input distribution

Prob(0, 0, 0) = ε, Prob(1, 1, 1) = 1− ε.

Assume that the vector (1, 0, 0) was received. What is the best decision for whatwas sent ((0, 0, 0) or (1, 1, 1)) according to an MAP decision rule? Again, wemust use Bayes’ Theorem to calculate the necessary conditional probabilities:

Prob((0, 0, 0) was sent | (1, 0, 0) was received )

=Prob((1, 0, 0) was received) | (0, 0, 0) was sent )Prob((0, 0, 0) was sent )

Prob((1, 0, 0) was received )

=p(1− p)2ε

p(1− p)2ε + p2(1− p)(1− ε),

whereas,

Prob((1, 1, 1) was sent | (1, 0, 0) was received )

=Prob((1, 0, 0) was received) | (1, 1, 1) was sent )Prob((1, 1, 1) was sent )

Prob((1, 0, 0) was received )

=p2(1− p)(1− ε)

p(1− p)2ε + p2(1− p)(1− ε).

Thus, an MAP decision rule would tell us to decide that (0, 0, 0) was sent ifp(1 − p)2ε ≥ p2(1 − p)(1 − ε); since both p, 1 − p, ε > 0, we see that thiscondition is equivalent to

1− p

p≥ 1− ε

ε.

Page 5: BER and the Hamming Codes 1 MAP and ML Decision Rules

1 MAP AND ML DECISION RULES 5

Again, the reader can complete this analysis.

The next decision rule is much easier to implement as it involves only “forwardprobabilities” and therefore is independent of the input distribution. This is theso-called maximum likelihood decision rule, as follows:

Definition 1.4 (The Maximum Likelihood Rule). The decision rule d : V → Cis called a maximum likelihood rule, or ML rule for short, if it maximizes eachconditional probability

Prob(v is received | d(v) is sent), v ∈ V.

Note that for the each of the above two examples, an ML decision rule is uniqueand is given by

d(0) = 0, d(1) = 1, in Example 1;

andd(0, 0, 0) = d(1, 0, 0) = d(0, 1, 0) = d(0, 0, 1) = (0, 0, 0)

and

d(1, 1, 0) = d(1, 0, 1) = d(0, 1, 1) = d(1, 1, 1) = (1, 1, 1) in Example 2.

Since only the forward probabilities of the channel are involved, we see that anML decision rule doesn’t depend on the input distribution. However, as the aboveexamples indicate, the two decision rules can differ. However, if the input distributionis uniform, then an ML decision rule is an MAP rule (and conversely):

Lemma 1.1. Assume that the input distribution is uniform, i.e., that for each c ∈ C,

Prob(c is sent) =1

|C|.

Then any ML decision rule d : V → C is also an MAP rule, and conversely.

Proof. Indeed, for any vector v ∈ V , we have

Prob(d(v) is sent |v is received) =Prob(v is received | d(v) is sent)Prob(d(v) is sent)

Prob(v is received)

=Prob(v is received | d(v) is sent)

|C|Prob(v is received).

Therefore, we see that for fixed vector v ∈ V , Prob(d(v) is sent |v is received) is amaximum if and only if Prob(v is received | d(v) is sent) is a maximum.

The next result shows that decision rules based on minimal distance are maximum-likelihood decision rules (and conversely).

Page 6: BER and the Hamming Codes 1 MAP and ML Decision Rules

1 MAP AND ML DECISION RULES 6

Lemma 1.2. Let C ⊆ V be a code. A decision rule d : V → C is a maximum-likelihood decision rule if and only if for each v ∈ V , wt(v + d(v)) is chosen to be aminimum.

Proof. This is obvious, as for any pair c ∈ C, v ∈ V , we have

Prob(v is received | c is sent) = pwt(v+c)(1− p)n−wt(v+c),

which is a minimum if and only if wt(v + c) is a minimum.

Definition 1.5. Let C ⊆ V be a code, and assume that d : V → C is a decisionrule. If d satisfies the partial homomorphism property:

d(0) = 0, d(v + c) = d(v) + c, c ∈ C, v ∈ V,

then d is called a standard array decision rule.

Note that a standard array decision rule is uniquely determined by the valuesd(v1), d(v2) . . . , d(vr), where v1, v2, . . . , vr is a set of coset representatives for C in V .Furthermore, a standard array decision rule d : V → V satisfies d(c) = c, for allc ∈ C.

Lemma 1.3. Let C ⊆ V be a code and let v1 = 0, v2, . . . , vr be a set of coset repre-sentatives of minimal weight, i.e., for each i = 1, 2, . . . , r, we have

wt(vi) = minc∈C

wt(vi + c).

Then the standard array decision rule given by

d(vi + c) = c, i = 1, 2, . . . , r, c ∈ C,

is an ML decision rule.

Proof. This is virtually obvious since since if c′ ∈ C is closer (in the sense of weight)to vi + c than is c, then wt(vi + c+ c′) < wt(vi), contrary to vi having minimal weightamong elements in vi + C.

We shall call a decision rule constructed as in Lemma 1.3 a maximum likelihoodstandard array decision rule; Lemma 1.3 guarantees that such a decision rule reallyis an ML rule. Note, however, that a given coset v + C might not have a uniqueelement of minimal weight, in which case an ML standard array decision rule is alsonot unique.

As a simple example, consider the 4-dimensional vector space V = F4 and takethe code C to be the 2-dimensional subspace generated by (1, 0, 1, 1), (0, 1, 0, 1). Thefour cosets of C in V are listed below:

Page 7: BER and the Hamming Codes 1 MAP and ML Decision Rules

2 BIT ERROR RATE (BER) 7

(0, 0, 0, 0) (1, 0, 1, 1) (0, 1, 0, 1) (1, 1, 1, 0)(1, 0, 0, 0) (0, 0, 1, 1) (1, 1, 0, 1) (0, 1, 1, 0)(0, 1, 0, 0) (1, 1, 1, 1) (0, 0, 0, 1) (1, 0, 1, 0)(0, 0, 1, 0) (1, 0, 0, 1) (0, 1, 1, 1) (1, 1, 0, 0)

Note that each coset except the third contains a unique coset representative of minimallength. Thus, there are two maximum likelihood standard array decision rules d1, d2 :V → C, and are determined by

d1(0, 0, 0, 0), d1(1, 0, 0, 0), d1(0, 1, 0, 0), d1(0, 0, 1, 0) = 0,

andd2(0, 0, 0, 0), d2(1, 0, 0, 0), d2(0, 0, 0, 1), d1(0, 0, 1, 0) = 0.

We shall have occassion to refer to this example several more times in the sequel.

2 Bit Error Rate (BER)

We shall be sending k-bit (binary) messages across our noisy BSC with crossoverprobability p. We regard the k bit messages as vectors in the vector space M = Fk ={(a1, a2, . . . , ak) | ai ∈ F}. These messages are to be encoded as n-bit codewords viasome “encoding map”

M = FkE−→ Fn = V.

The image C = E(M) is the code. Thus, corresponding to the k-bit message word mis the n-bit codeword E(m). Owing to the noise in the channel, the received vectorwill have the form

E(m) = E(m) + e,

where e is the random error vector having distribution

Prob(e = e0) = pl(1− p)n−l,

where l = wt(e0).

The entire encoding/transmission/decision/decoding process can be viewed thus:

ME,∼=

encode- C

(transmission)

V?

................. d

decision rule- C

D = E−1,∼=decode

- M,

m - E(m)

E(m) + e?

- d(E(m) + e) - D(d(E(m) + e).

Page 8: BER and the Hamming Codes 1 MAP and ML Decision Rules

2 BIT ERROR RATE (BER) 8

That is to say, the intended message m gets encoded, sent, received, decided upon,and decoded as the message D(d(E(m) + e)) ∈ M , where, as usual, e is the randomerror vector generated by the BSC.

Next, for any vector m ∈ M , denote by m(i) the i-th coordinate of m, i.e., m =(m(1), . . . ,m(k)) ∈ M , and consider the conditional probability

Prob(D(d(E(m) + e))(i) 6= m(i) | E(m) was sent), i = 1, 2, . . . , k.

This is simply the probability that the final message word disagrees with the intendedmessage word in the i-th message bit. Put somewhat differently, we may define therandom variables

Xi(m) = D(d(E(m) + e))(i) + m(i),

and considerProb(Xi(m) = 1),

for i = 1, 2, . . . , k.

We might ask the following questions about the random variables Xi(m), i =1, 2, . . . , k.

(i) For fixed m ∈ M , are the random variables Xi(m), i = 1, 2, . . . , k, identicallydistributed?

(ii) For fixed m ∈ M , are the random variables Xi(m), i = 1, 2, . . . , k, independent?

(iii) For fixed i, 1 ≤ i ≤ k, how do the Xi(m) depend on the message vector m ∈ M?

We would certainly expect the answers to the above questions to depend on thedecision rule d : C → V .

If we average the probabilities Prob(Xi(m) = 1) over i = 1, 2, . . . , k, then thisdefines the conditional bit error rate:

BER(m) =1

k

k∑i=1

Prob(Xi(m) = 1)

=1

k

k∑i=1

Prob(D(d(E(m) + e))(i) 6= m(i) | E(m) was sent).

In other words, given that the message m was the intended message, k · BER(m)is the expected number of bit errors in the message that actually turns up at thereceiving end (after deciding and decoding). The (unconditional) bit error rate, isthe weighted average over all possible messages1:

1This is equivalent to the symbol error rate Psymb given on page 20 of F.J. MacWilliams and N.J.A.Sloane’s book, The Theory of Error-Correcting Codes, North-Holland Publishing Company, Amsterdam,1978. While they don’t explicitely say so, their definition is valid only for uniform input distributions.

Page 9: BER and the Hamming Codes 1 MAP and ML Decision Rules

2 BIT ERROR RATE (BER) 9

BER =1

k

∑m∈M

k∑i=1

Prob(Xi(m) = 1)Prob(E(m) was sent)

=1

k

∑m∈M

k∑i=1

Prob(D(d(E(m) + e))(i) 6= m(i) | E(m) was sent)Prob(E(m) was sent).

Thus, kBER is the expected number of message bit error per transmission.

The above notion of bit error rate is what one might more properly call the post-decoding bit error rate, which is an average of the error probabilities in the messagebits. If instead we consider the average of the error probabilities in the code bits, wewould obtain what would be called the post-decison bit error rate:

BERpd =1

n

∑m∈M

n∑i=1

Prob((d(E(m)+e))(i) 6= E(m)(i) |E(m) was sent)Prob(E(m) was sent).

In analogy with the above, nBERpd is the expected number of codebit errors pertransmission.

Next, we show that when we use a standard array decision rule d : C → V , thecomputations of BER and BERpd can be simplified considerably.

Proposition 2.1. Let C ⊆ V , be a code with dim C = k and dim V = n, and letd : C → V be a standard array decision rule. The the post-decision and post-decodingbit error rates are given by

BER = 1k

∑e∈V

wt(Dd(e))Prob(e), and

BERpd = 1n

∑e∈V

wt(d(e))Prob(e).

Proof. This is pretty easy. First of all, note that

Prob(D(d(E(m) + e))(i) 6= m(i) | E(m) was sent)

= Prob(D(E(m) + d(e))(i) 6= m(i) | E(m) was sent)

= Prob(m + D(d(e))(i) 6= m(i) | E(m) was sent)

= Prob(D(d(e))(i) 6= 0 | E(m) was sent).

However, the error vector e ∈ V is generated by the BSC independently of whichencoded message E(m) was sent, thus

Prob(D(d(e))(i) 6= 0 | E(m) was sent) = Prob(D(d(e))(i) 6= 0).

Page 10: BER and the Hamming Codes 1 MAP and ML Decision Rules

2 BIT ERROR RATE (BER) 10

Therefore,

BER =1

k

∑m∈M

k∑i=1

Prob(D(d(E(m) + e))(i) 6= m(i) | E(m) was sent)Prob(E(m) was sent)

=1

k

∑m∈M

k∑i=1

Prob(D(d(e))(i) 6= 0)Prob(E(m) was sent)

=1

k

k∑i=1

Prob(D(d(e))(i) 6= 0)

=1

k

∑e∈V

wt(Dd(e))Prob(e)

The proof of the corresponding recipe for BERd is entirely similar. In other words,we see that k · BER is the expected weight (measured in M) of the random vectorDd(e), e ∈ V . Similarly, k · BERd is the expected weight (measured in V ) of therandom vector d(e), e ∈ V .

It would be of interest to determine under what conditions BER = BERpd. Ingeneral, one wouldn’t expect them to agree, if only because BER at least ostensiblydepends upon the encoding E : M → C, as well as on the decision rule d : V → C,whereas BERpd depends only upon the decision rule. However, if one uses systematicencoding (to be explained below), then one might reasonably inquire as to whetherone might have BER = BERpd, say under the assumption of an ML standard arraydecision rule.

In fact, this is what initially spurred my interest in this endeavor, for in my questfor the BER of the Hamming codes, I was referred by Michele Eile to the “standardreference” by J. H. van Lint, Coding Theory, Lecture Notes in Mathematics,” vol. 201,Springer-Verlag, New York, 1973, pp. 25–26. However, van Lint computes nBERpd

for the Hamming codes and not kBER. I am still searching for a computation of BER,(or kBER) although it appears that for the Hamming codes (under the assumption ofsystematic encoding), BER = BERpd. I’ll give evidence for this in the next section.

Example. We refer again to the example C ⊆ V given on page 6 and compute itspost-decision BER relative to the two ML standard array decision rules d1, d2 :V → C. We have

4BERpd =∑e∈V

wt(d1(e))Prob(e)

=4∑

j=0

∑e∈V,wt(e)=j

wt(d1(e))Prob(e)

= 2p(1− p)3 + 17p2(1− p)2 + 10p3(1− p) + 3p4.

The same result holds for the decision rule d2 : V → C.

Page 11: BER and the Hamming Codes 1 MAP and ML Decision Rules

2 BIT ERROR RATE (BER) 11

Definition 2.1. Let C ⊆ V = Fn be a code. An encoding scheme E : M = Fk

∼=→ Cis called systematic, or is a row echelon form encoding scheme, if and only ifthe k × n matrix G having rows r1, r2, . . . , rk is in row echelon form, where

E(0, 0, . . . , 1, 0, . . . , 0) = ri ∈ C.

1 in position i

When this happens, a permutation of the colums can be applied to bring the matrixinto the form

G =[

Ik×k... Ak×n−k

]and that a codeword (a1, a2, . . . , an) ∈ C will be decoded as

D(a1, a2, . . . , an) = (a1, a2, . . . , ak) ∈ M.

To see how different encoding schemes can lead to different BERs, even withrespect to the same decision rule, consider once again the example of the two-dimensional code, given on page 6. Thus, C ⊆ V = F4 has basis {r1 = (1, 0, 1, 1), r2 =(0, 1, 0, 1)} and the encoding scheme E : (a1, a2) 7→ a1r1 + a2r2 is systematic. Withrespect this choice of encoding and with respect to the ML standard array decisionrule d = d1 given on page 7, one has

2BER =4∑

j=0

∑e∈V,wt(e)=j

wt(Dd(e))Prob(e)

= p(1− p)3 + 9p2(1− p)2 + 5p3(1− p) + p4 ≈ p + 6p2,

for small enough p.On the other hand, were one to take the non-systematic encoding E ′ : (a1, a2) 7→

a1r′1 + a2 + r′2, where

r′1 = r1 = (1, 0, 1, 1), r′2 = r1 + r2 = (1, 1, 1, 0),

then the corresponding BER (relative to d = d1) is given by

2BER =4∑

j=0

∑e∈V,wt(e)=j

wt(D′d(e))Prob(e)

= 2p(1− p)3 + 7p2(1− p)2 + 6p3(1− p) + p4 ≈ 2p + p2,

for small p. Therefore, we conclude that, at least for small enough values of p (i.e., fora good enough BSC), we see that the BER computed in terms of the ML standardarray decision rule d = d1 and systematic encoding is less than the correspondingBER computed in terms of the non-systematic encoding scheme E ′ : M → C.2

2The equivocation “for small enough p,” turns out not to be necessary, since one can show in this examplethat the BER for systematic encoding E is less than the BER with respect to E′ : M → C for all p between0 and 1/2.

Page 12: BER and the Hamming Codes 1 MAP and ML Decision Rules

3 THE HAMMING CODES 12

Remark. Note that neither of the post-decoding bit error rates agree with thepost-decision BER given on page 10.

The following theorem would be highly desirable—I’ll state it as a conjecture. Itshould be known, but I’ve not seen any relevant discussions.

Conjecture. Let C ⊆ V = Fn be a code and fix an ML standard array decision

rule d : V → C. Let E, E ′ : M = Fk

∼=→ C be encoding schemes with E systematic. IfBERE, BERE′ are the corresponding bit error rates, then BERE ≤ BERE′ for smallenough p.3 4

3 The Hamming Codes

In principle, the Hamming Codes are very easy to describe. To this end, we fix anl-dimensional vector space W over the binary field F2. Let P be the set of nonzerovectors in W and let V = F〈P〉 be the vector space with basis P . Thus we have the“tautological map” τ : V → W determined by v 7→ v, v ∈ P . We set H(W ) to be thekernel of V → W , and call it the Hamming code on W .5 Thus the Hamming code onW fits into an exact sequence

0 - H(W ) - F〈P〉τ

- W - 0,

from which it follows that dim H(W ) = 2l−l−1. Thus, it is obvious that the minimalweight of H(W ) is 3, since no two (or fewer) nonzero vectors in W can be linearlydependent. (Note that the vector space V can be identified with the Boolean group(with symmetric difference as the operation) on the set of nonzero vectors of W .)

Definition 3.1. Let C ⊂ V be an (n, k) linear block code of minimal weight δ. Wesay that C is a perfect code if there exists r < δ such that

V =⋃c∈C

Bc(r), disjoint union,

where Bc(r) = {v ∈ V | wt(v + c) ≤ r}.

For such a code we see that the ML decision rule is uniquely determined: if v ∈ Vwe take d(v) = c, where c ∈ C is the unique vector living in Bv(r), where r is chosenin accordance with the above. Furthermore, as a result of Lemma 1.3, we concludethat the ML decision rule is necessarily a standard array decision rule.

This is all relevant, because of

3Again, this restriction on p might not be necessary.4In the definition of BER in MacWilliams-Sloane, it is tacitly assumed that the encoding scheme is

systematic.5If n = 2l − 1 and k = n− l, we sometimes call H(W ) the (n, k)-Hamming code.

Page 13: BER and the Hamming Codes 1 MAP and ML Decision Rules

4 SECOND-ORDER BER FOR THE HAMMING CODES 13

Lemma 3.1. The Hamming code C = H(W ) ⊆ V is a perfect code of minimal weight3.

Proof. Since C has minimal weight 3, it already follows that for all c 6= c′ in C,we have Bc(1) ∩ Bc′(1) = ∅. Next, let dim W = l; it is clear that for all c ∈ C,|Bc(1)| = n + 1 = 2l. Therefore,

|⋃c∈C

Bc(1)| = (n + 1)|C| = (2l)(2n−l) = 2n = |V |.

4 Second-Order BER for the Hamming Codes

Let W be an l-dimensional vector space over the binary field F, and let C = H(W ) ⊆V = F〈P〉, be the corresponding Hamming code, where, as above, P is the setof nonzero vectors of W . Relative to the unique ML standard array decision ruled : V → C and a systematic encoding scheme E : M = Fk → C, k = 2l − l − 1,we shall compute the second order term (= coefficient of p2) in the bit error rate.Indeed, this makes sense, as Proposition 2.1 shows that the bit error rate, as well asthe post-decision bit error rate are polynomials in the crossover probability p of theBSC.

We wish to give an intrinsic characterization of systematic encoding schemes E :M → C. If n = 2l−1, then any fixed ordering of the vectors in P gives an isomorphismFn → V = F〈P〉. For convenience, if S ⊆ P is a subset, let [S] =

∑s∈S

s ∈ W . Next, let

B = {w1, w2, . . . , wl} be a basis of W . For each j ≥ 2, let Sij ⊆ B, i = 1, 2, . . . ,(

lj

),

be the distinct subsets of size j in B. Note that

l∑j=2

(l

j

)= 2l − l − 1,

and that the vectors [Sij] ∈ W are all distinct and none is in B. Next, for eachnonzero vector w ∈ P , let µw : F → V = F〈P〉 be the w-th coordinate function. IfQ ⊆ P define the element [[Q]] ∈ V by setting

[[Q]] =∑q∈Q

µq(1) ∈ V.

Finally, define the vectors rij ∈ V, 2 ≤ j ≤ l, 1 ≤ i ≤(

lj

)by setting

rij = [[{[Sij]} ∪ Sij]].

For example, consider the (7, 4)-Hamming code H(W ), and so W is 3-dimensionaland has basis B = {w1, w2, w3}. Take the ordering of the j-element subsets of B

Page 14: BER and the Hamming Codes 1 MAP and ML Decision Rules

4 SECOND-ORDER BER FOR THE HAMMING CODES 14

to be {w1, w2}, {w1, w3}, {w2, w3}, {w1, w2, w3}. Then relative to this ordering, theelements rij ∈ V are the row vectors

r12 = (1, 0, 0, 0, 1, 1, 0),

r22 = (0, 1, 0, 0, 1, 0, 1),

r32 = (0, 0, 1, 0, 0, 1, 1),

r13 = (0, 0, 0, 1, 1, 1, 1).

It is clear that these vectors form a basis of C = H(W ) and that the correspondingmatrix is in row echelon form. Thus, the encoding scheme

E : (1, 0, 0, 0) 7→ r12, (0, 1, 0, 0) 7→ r22, (0, 0, 1, 0) 7→ r32, (0, 0, 0, 1) 7→ r13

is systematic. Finally, it is not hard to see that any systematic encoding scheme mustarise in the above fashion.

We let BER(2), BER(2)pd be the second-order bit error rate and post-decision bit

error rate, respectively, of the Hamming code using the unique ML standard arraydecision rule d : V → C and a systematic encoding scheme E : M → C. Thus,

kBER(2) =∑

e∈V, wt(e)=2

wt(Dd(e))Prob(e) =∑

e∈V, wt(e)=2

wt(Dd(e))p2(1− p)n−2,

and

nBER(2)pd =

∑e∈V, wt(e)=2

wt(d(e))Prob(e) =∑

e∈V, wt(e)=2

wt(d(e))p2(1− p)n−2.

We prove below that at least the second order bit error rates do agree:

Theorem 4.1. For the Hamming code H(W ) with decision rule and encoding scheme

as above, BER(2) = BER(2)pd = 3

2(n− 1).

Proof. Perhaps we should note first that neither BER nor BERpd contain nonzerolinear terms in p. This is because error vectors of weight 1 are closest to the zero vectorin C and hence if wt(e) = 1, then d(e) = 0, i.e., such errors get “corrected.” Next,note that if wt(e) = 2, then d(e) ∈ C is necesarily a vector of weight 3. Therefore,

1

n

∑e∈V,wt(e)=2

wt(d(e)) =3

n

(n

2

)=

3

2(n− 1).

On the other hand, note that each codeword c ∈ C of weight 3 is d(e) for preciselythree error vectors e ∈ V of weight 2. Therefore,

1

k

∑e∈V,wt(e)=2

wt(Dd(e)) =3

k

∑c∈C,wt(c)=3

wt(D(c)).

Page 15: BER and the Hamming Codes 1 MAP and ML Decision Rules

4 SECOND-ORDER BER FOR THE HAMMING CODES 15

Therefore, we have reduced the problem to that of showing∑c∈C,wt(c)=3

wt(D(c)) =k

2(n− 1).

We now recall the basis elements {rij | 2 ≤ j ≤ l, 1 ≤ i ≤ j} of C = H(W ).Clearly the elements of C such that wt(D(c)) = 1 are the basis elements rij. However,those of weight 3 (in C) are precisely the basis vectors r12, r22, . . . , r(l

2)2, i.e., the

number of vectors c ∈ C of weight 3 and such that wt(D(c)) = 1 is(

l2

). Next, the

vectors in C with wt(D(c)) = 2 are of the form rij + ruv, where {i, j} 6= {u, v}. Sucha vector has weight 3 in C precisely when |Sij| = |Suv| + 1, and Sij ⊇ Suv or when|Sij| = |Suv|−1, and Sij ⊆ Suv. The number of such subsets can be enumerated thus:(

l

3

)(3

2

)+

(l

4

)(4

3

)+ · · ·+

(l

l

)(l

l − 1

)=

l∑s=3

(l

s

)(s

s− 1

).

This quantity can be computed fairly easily. We have, by the Binomial Theorem,that

(1 + x)l =l∑

s=0

(l

s

)xs,

and so

l(1 + x)l−1 =d

dx(1 + x)l =

l∑s=1

(l

s

)sxs−1 =

l∑s=1

(l

s

)(s

s− 1

)xs−1.

Therefore,l∑

s=1

(l

s

)(s

s− 1

)= l(1 + 1)l−1 = l2l−1,

from which it follows that

l∑s=3

(l

s

)(s

s− 1

)= l2l−1 −

(l

2

)(2

1

)−(

l

1

)(1

0

)= l2l−1 − l(l − 1)− l

= l(2l−1 − l).

In other words, the number of codewords c ∈ C of weight 3 with wt(D(c)) = 2is l(2l−1 − l). Finally, the number of codewords c ∈ C of weight 3 and havingwt(D(c)) = 3 is equal to

(# codewords in C of weight 3)−(

l

2

)− l(2l−1 − l).

Clearly, the number of codewords of weight 3 in C is equal to the number of 2-dimensional subspaces in W , which in turn is given by the Gaussian coefficient[

l2

]2

=(2l − 1)(2l − 2)

(22 − 1)(22 − 2)=

1

3

(n

2

).

Page 16: BER and the Hamming Codes 1 MAP and ML Decision Rules

5 VAN LINT’S CALCULATION OF BERPD 16

Therefore, the number of codewords c ∈ C of weight 3 and having wt(D(c)) = 3 isequal to

1

3

(n

2

)−(

l

2

)− l(2l−1 − l).

Putting all this together, we have∑c∈C,wt(c)=3

wt(D(c)) =

(l

2

)+ 2l(2l−1 − l) + 3

[1

3

(n

2

)−(

l

2

)− l(2l−1 − l)

]=

k

2(n− 1) after some calculation!

This completes the proof.

5 van Lint’s Calculation of BERpd

The calculation of BERpd for the Hamming codes, first given by van Lint6 is actuallyfairly easy. To this end let C = H(W ) be an (n, k)-Hamming code, where n =2l − 1, k = 2l − l − 1. For each integer i = 0, 1, 2, . . . , n, let

Ai = #( codewords of weight i).

Thus, we have already seen that A0 = 1, A1, A2 = 0, A3 = 13

(n2

).

We have

nBERpd =∑e∈V

wt(d(e))Prob(e)

=∑c∈V

∑e∈Bc(1)

wt(d(e))Prob(e)

=n∑

i=0

∑c ∈ C

wt(c) = i

∑e∈Bc(1)

wt(d(e))Prob(e).

Next, for a fixed codevector c ∈ C of weight i, note that∑e∈Bc(1)

wt(d(e))Prob(e) = i2pi−1(1− p)n−i+1 + ipi(1− p)n−i + i(n− i)pi+1(1− p)n−i−1

= iP (p, i),

where we have set

P (p, i) = ipi−1(1− p)n−i+1 + pi(1− p)n−i + (n− i)pi+1(1− p)n−i−1.

Therefore, it follows that

nBERpd =n∑

i=0

iAiP (p, i)

=n∑

i=0

iAi[ipi−1(1− p)n−i+1 + pi(1− p)n−i + (n− i)pi+1(1− p)n−i−1].

6Lecture Notes in Mathematics,” vol. 201, Springer-Verlag, New York, 1973, pp. 25–26.

Page 17: BER and the Hamming Codes 1 MAP and ML Decision Rules

5 VAN LINT’S CALCULATION OF BERPD 17

Therefore, we see that for the Hamming codes, the post-decision bit error ratedepends solely on the so-called weight enumerator polynomial, which is given by

A(x) =n∑

i=0

Aixi.

Van Lint has also, computed A(x); we shall sketch his development. We begin bysetting Ai = {c ∈ C | wt(c) = i}. Notice that the weight i vectors v ∈ V come fromthree sources:

(1) Those that are alrady codevectors in Ai.

(2) Those that are distance 1 from codevectors in Ai+1; note that each codevectorgives rise to

(i+1

i

)= i + 1 vectors of weight i.

(3) Those that are distance 1 from codevectors in Ai−1.

From the above, it follows that(n

i

)= |Ai|+ (i + 1)|Ai+1 + (n− i + 1)|Ai−1|,

and so (n

i

)= Ai + (i + 1)Ai+1 + (n− i + 1)Ai−1.

Multiply both sides of this equation by xi and sum over i = 0, 1, . . . , n + 1:

n+1∑i=0

(n

i

)xi =

n+1∑i=0

Aixi +

n+1∑i=0

(i + 1)Ai+1xi +

n+1∑i=0

(n− i + 1)Ai−1xi,

and so

n∑i=0

(n

i

)xi =

n∑i=0

Aixi +

d

dx

n+1∑i=0

Ai+1xi+1 + n

n+1∑i=0

Ai−1xi −

n+1∑i=0

(i− 1)Ai−1xi.

This implies, of course, that

(1 + x)n = A(x) + A′(x) + nxA(x)− x2

n∑i=0

iAixi−1

= A(x) + A′(x) + nxA(x)− x2A′(x),

which can be written as the first-order Bernoulli equation

A′(x) +

(1 + nx

1− x2

)A(x) =

(1− x)n

1− x2.

Recall that the first-order Bernoulli equation y′+p(x)y = q(x) is solved by multiplyingthrough by the integrating factor u(x) = e

∫p(x)dx, with the solution being

y =1

u(x)

∫u(x)q(x)dx.

Page 18: BER and the Hamming Codes 1 MAP and ML Decision Rules

6 BER = BERPD FOR THE (7, 4)-HAMMING CODE 18

If we apply this to the above, the final result is that

A(x) =1

n + 1(1 + x)n +

n

n + 1(1 + x)(n−1)/2(1− x)(n+1)/2.

We return now to the computation of BERpd. If we set q = 1− p, then we have

nBERpd = qn

n∑i=0

iAi{xi + (n− i)xi+1 + ixi−1}

= qn{((n− 1)x2 + x + 1)A′(x) + (x− x3)A′′(x)}

After some calculation, this ultimately boils down to

nBERpd =n

n + 1

((n− 1)

p2

1− p+ 1

)[1− (1 + (n− 1)p)(1− 2p)(n−1)/2

]+

p(1− 2p)

1− p· n(n− 1)

n + 1

[1 + (np2 − p2 + 4p− 1)(1− 2p)(n−3)/2

]In order to interpret the limiting value of the above, note first that the expression

α = np represents the expected number of codebit errors in unencoded transmissionof n codebits. So we fix α = np and let n → ∞, p → 0 in the above expression ofnBERpd and find that

nBERpd →[1− (1 + α)e−α

]+ α(1− e−α)

= α +[1− (1 + 2α)e−α

].

In particular, if (1 + 2α)e−α) > 1 (which is roughly saying that α > 1.2564), we seethat asymptotically nBERpd is greater than α, i.e., the Hamming codes do worse thanwith no coding at all !

6 BER = BERpd for the (7, 4)-Hamming Code

In the previous section, we saw that if A(x) =n∑

i=0

Aixi was the weight enumerator for

the Hamming code C, then the post-decision bit error rate is given by

BERpd =1

n

n∑i=0

iAiP (p, i)

where,

P (p, i) = [ipi−1(1− p)n−i+1 + pi(1− p)n−i + (n− i)pi+1(1− p)n−i−1].

Page 19: BER and the Hamming Codes 1 MAP and ML Decision Rules

6 BER = BERPD FOR THE (7, 4)-HAMMING CODE 19

On the other hand, relative to decoding D : C → M , we have that the post-decodingbit error rate is

BER =1

k

∑e∈V

wt(D(d(e))Prob(e)

=1

k

∑c∈C

∑e∈Bc(1)

wt(D(d(e))Prob(e)

=1

k

n∑i=0

∑c ∈ C

wt(c) = i

∑e∈Bc(1)

wt(D(d(e))Prob(e)

=n∑

i=0

∑c ∈ C

wt(c) = i

wt(D(c))P (p, i)

=n∑

i=0

P (p, i)∑

c ∈ Cwt(c) = i

wt(D(c)).

Therefore, we see that BERpd = BER provided that we can show∑c ∈ C

wt(c) = i

wt(D(c)) =ki

nAi,

for each i = 0, 1, . . . , n.

For the (7, 4)-Hamming code, the weight enumerator polynomial is A(x) = 1 +7x3 + 7x4 + x7. Verifying that ∑

c ∈ Cwt(c) = 3

wt(D(c)) = 12,

∑c ∈ C

wt(c) = 4

wt(D(c)) = 16,

and that ∑c ∈ C

wt(c) = 7

wt(D(c)) = 4,

is entirely routine and can be left to the reader.

Page 20: BER and the Hamming Codes 1 MAP and ML Decision Rules

7 PERFORMANCE OF THE HAMMING CODES 20

7 Performance of the Hamming Codes

This final section can best be thought of as an “engineering appendix,” as it docu-ments how electrical engineers view the performance of codes. First of all we needa way to compute the crossover probability of error for a BSC. This depends on anumber of factors, including the mode of binary signaling, the mode of reception,the energy Eb per sent bit and the white noise spectral density, N0. A commonassumption is to use what is called “bipolar signaling,” and invoke a theorem of elec-trical engineering which asserts that the probability of error (=crossover probability)is minimized precisely when the “matched filter” reception design7 is used, with theresulting error probability given by

p = Q

(√2Eb

N0

),

and where the Q function is defined by

Q(z) =1√2π

∫ ∞

z

e−λ2/2dλ.

In applying this to, say, the (7, 4)-Hamming code, we must realize that the abovecalculation is predicated on having invested Eb joules in one bit. However, in the(7, 4)-Hamming code there is redundancy to the extent that seven actual electronic“bits” are sent for every four message bits. Since the BER is reflective of the messagebits, then we must realize that in order to dedicate 1 Joule of energy to a messagebit, we need to put in 7

4Joules into each codebit (the physically transmitted bit).

Therefore, if we write the bit error rate as a polynomial in the crossover probabilityp:

BER = B(p),

then in terms of Eb/N0 (the “engineering standard”) we would use

BER = B(Q(√

8Eb/7N0

))for the computation.

In the graph below, the second-order approximation BER ≈ 9p2 = 9[Q(√

8Eb/7N0

)]2,

taken from Theorem 4.1 has been used. The resulting permormance graph, with thecomparison taken against the unencoded transmission is as below:

7Such a filter is characterized by the fact that its impulse response is matched to a (reverse copy of) theknown input signal.

Page 21: BER and the Hamming Codes 1 MAP and ML Decision Rules

7 PERFORMANCE OF THE HAMMING CODES 21

Performance of (7, 4)-Hamming Code vs. Unencoded Transmission (· · · )

10−9

10−8

10−7

10−6

10−5

10−4

10−3

(BER)

10−2

10−1

1

-1 0 1 2 3 4 5 6 7 8 9 10 11 12Eb/N0 dB

The way engineers read this graph is by saying, for example, that if one wants abit error rate of no worse than 10−8, then one must arrange to send signals that are atleast 11.5 dB over the background white noise for the Hamming codes and at least 12dB over the background white noise for the unencoded transmission. Put differently,at a BER of 10−8 we see is roughly a 1/2 dB gain over unencoded transmission.

Below, we have given the performance graph for the (15, 11)-Hamming code. Inthis case, at a BER of 10−8 we see approximately a 1.5 dB gain as compared withthe unencoded transmission.

Performance of (15, 11)-Hamming Code vs. Unencoded Transmission (· · · )

10−12

10−10

10−8

10−6

10−4

10−2

1

-1 0 1 2 3 4 5 6 7 8 9 10 11 12Eb/N0 dB