Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the...

Another question

• consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown

• what is the probability that a randomly chosen bit from the encoded message is 1?

a 1/2 0

b 1/4 10

c 1/8 110

d 1/8 111

€

P 1( ) =

P 1si( ) × P si( )i

∑ × li

P si( )i

∑ × li

=0 ×1 2 ×1+ 1 2 ×1 4 × 2 + 2 3×1 8 × 3 + 3 3×1 8 × 3

1 2 ×1 + 1 4 × 2 + 1 8 × 3 + 1 8 × 3

=7 8

7 4=1 2

= (expected number of 1s/expected number of bits)

Another question

• consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown

• what is the probability that a randomly chosen bit from the encoded message is 1?

a 1/2 0

b 1/4 10

c 1/8 110

d 1/8 111

Shannon-Fano theorem

• Channel capacity– Entropy (bits/sec) of encoder determined by entropy of source (bits/sym)– If we increase the rate at which source generates information (bits/sym) eventually we

will reach the limit of the encoder (bits/sec). At this point the encoder’s entropy will have reached a limit

• This is the channel capacity• S-F theorem

– Source has entropy H bits/symbol– Channel has capacity C bits/sec– Possible to encode the source so that its symbols can be transmitted at up to C/H

symbols per second, but no faster– (general proof in notes)

source encode/transmit

receive/decode

destinationchannel

Conditional Entropy (lecture 3)

• conditional entropy of A given B=bk is the entropy of the probability distribution Pr(A|B=bk)

• the conditional entropy of A given B is the average of this quantity over all bk

€

H A B = bk( ) = P ai bk( )i

m

∑ log1

P ai bk( )

€

H A B( ) = P b j( )j

n

∑ P ai b j( )i

m

∑ log1

P ai b j( )

⎛

⎝

⎜ ⎜

⎞

⎠

⎟ ⎟

= P ai,b j( )i, j

∑ log1

P ai b j( )the average uncertainty about A when B is known

Mutual information - example

A

B

0 with probability p1 with probability 1-p

0 with probability q1 with probability 1-q

C c= a+b mod 2

if p=q = 0.5, (i) what is the probability that c = 0 ?(ii) what is I(C;A) ?What if p= 0.5 and q = 0.1 ?What about the general case, any p, q ?

transmit (A) receive (C)

noise (B)

General case

a=0 with probability p - in this case, Pr(c=0) = q, Pr(c=1) = 1-q

a=1 with probability 1-p - in this case, Pr(c=0) = 1-q, Pr(c=1) = q

average uncertainty about C given A =

€

q log 1 q( ) + 1− q( )log 1 1− q( )( )

= H B( )

I(A;C) = H(C) - H(B)

H(A|C)

H(C)

H(A)

H(C|A) = H(B)

Discrete Channel with Noise

noise

channelsourceencode/transmit

receive/decode dest.

A X Y B

A

equivocation = H(X | Y )

transmission rate = H(X ) H(X | Y )

channel capacity = max (transmission rate)

Noisy Channels

• A noisy channel consists of an input alphabet X, an output alphabet Y and a set of conditional distributions Pr(y|x) for each y Y and x X

binary symmetric channel0

1

0

1x y

€

P y = 0 x = 0( ) =1− f P y = 0 x =1( ) = f

P y =1x = 0( ) = f P y =1x =1( ) =1− f

Inferring input from output0

1

0

1x y

€

P y = 0 x = 0( ) = 0.85 P y = 0 x =1( ) = 0.15

P y =1x = 0( ) = 0.15 P y =1x =1( ) = 0.85

error probability = 0.15source distribution P(0)=0.9

observe y=1Use Bayes

€

P x =1y =1( ) =P y =1x =1( )P x =1( )

P y ′ x ( )P ′ x ( )′ x

∑

=0.85 ×1

0.85 × 0.1+ 0.15 × 0.9= 0.39

x=0 is still more probable than x=1

€

P x y( ) =P y x( ) × P x( )

P y( )

Other useful models

binary erasure channel0

1

0

1

x y

€

P y = 0 x = 0( ) =1− f P y = 0 x =1( ) = 0

P y = ? x = 0( ) = f P y = ? x =1( ) = f

P y =1x = 0( ) = 0 P y =1x =1( ) =1− f

Z channel0

1

0

1x y

€

P y = 0 x = 0( ) =1 P y = 0 x =1( ) = f

P y =1x = 0( ) = 0 P y =1x =1( ) =1− f

?

Information conveyed by a channel

• input distribution P(x), output distribution P(y)

• mutual information I(X;Y)

• what is the distribution P(x) that maximises I

(channel capacity)

I(X;Y)

H(X|Y)

H(Y|X)H(Y)

H(X)

(also depends on error matrix)

Shannon’s Fundamental Theorem

• Consider source with entropy R and a channel with

capacity C such that R < C. There is a way of

(block) coding the source so that it can be

transmitted with arbitrarily small error

• group input symbols together (block code)

• use spare capacity for error correcting code

(Hamming code etc.)

Example - noisy dice

1

2

3

4

5

6

1

2

3

4

5

6

imagine restricting the input symbols to 2 and 5This is a non-confusable subset - - for any output, we would know the input

(similarly (1, 4) or {3, 6} )

Outline of proof• consider sequence of signals of length N

– as N increases, probability of error reduces

– (“typical” outputs are unlikely to overlap)

– as N , Pr(error) 0

– e.g. binary symmetric channel, f=0.15

repeat signal N times

(figures from Mackay, “Information Theory, Inference, and Learning Algorithms”, CUP)

Outline

• consider long time T, sequence length N– 2NH(X) typical source sequences, each occurs with probability 2-H(X)

– a typical received signal y corresponds to 2N(X|Y) possible inputs– choose 2NR random input sequences to represent our source messages– consider transmitting xi - if it is corrupted, it may be decoded as xj

where j≠i– if y is received, it corresponds to a set of inputs Sy

€

for j ≠ i, P x j ∈ Sy( ) ≤ P xk ∈ Sy( )k≠ i

2NR

∑

where P xk ∈ Sy( ) =2NH X Y( )

2NH X( )=

1

2NC

so P x j ∈ Sy( ) ≤ 2N R−C( ) since R < C we can make this as small as we like by choosing large N

Error Detection / Correction

noise



A X Y B

Errorcode

Errordetect

Errorcorrect

resend

• Error-detecting code– Detect if one or more digits

has been changed

– Cannot say which digits have changed

– E.g. parity check

• Error-correcting code– Error-detection as on left

– Can also work out which digits have been changed

– E.g. Hamming code

Error detection• If code words are very similar then it is difficult to detect

errors– e.g. 1010110100 and 1010110101

• If code words are very different then easier to detect errors– e.g. 1111111111 and 0000000000

• Therefore more different code words the better– Measure using Hamming Distance, d

• number of different digits • e.g. 011011 and 010101

• differ in 3 places, therefore d = 3

0 1 1 0 1 1

0 1 0 1 0 1

Differ ?

Hamming distance

• Measure of ‘distance’ between words• Choose nearest code word e.g.

a = 10000b = 01100c = 10011

• Use d to predict number of errors we can detect/correcte.g. parity check

10000 sent

10001 rec. 11000 rec.

d 2e+1 Can correct up to e errors per word

d = 2e Can correct up to e-1 errors per word, can detect e errors

d e+1 Can detect up to e errors per word

Error correction

• Like error-detecting code, but need more bits (obvious really!).

• More efficient when larger code words are being used• Overhead of coding/decoding arithmetic• Hamming code

– D = number of data bits– P = number of parity bits– C = D + P = code word length– Hamming inequality for single error correction:

D + P + 1 2P – If P is small – hardly worth doing – cheaper to re-send code word– If P 3 some increase in transmission rate is possible

Hamming code

• Process:– Coding

• Take code word from encoder (before adding parity bits) and multiply by generator matrix G using modulo-2 arithmetic

• This gives the code word– (d1, … , dD, p1, … , pP)

– Decoding• Take received code word (D+P bits) and multiply by decoder

matrix X using modulo-2 arithmetic• This gives us a syndrome (or parity) vector s• If s contains all zeros then no errors• Vector s is matched against X to find position of single error.

Example

• E.g. d = 4, p = 3G = [ I | A ]

• Encode 1001

• X = [AT | I]

• Receive 1101001 & decode

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

==

0

1

1

1

1

0

1

1

1

1

0

1

1000

0100

0010

0001

]A|I[G

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

==100

010

001

0

1

1

1

0

1

1

1

0

1

1

1

]I|A[X T

sendCW]1001001[

0

1

1

1

1

0

1

1

1

1

0

1

1000

0100

0010

0001

]1001[GCW ==

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

×=×

s

1

1

0

1

0

01

0

1

1

100

010

001

0

1

1

1

0

1

1

1

0

1

1

1

CWX receiced =⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡=

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

×⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

=×

Error in bit 2

Summary

probability

joint

conditional

bayes

entropy

decomposition

conditional

mututal information

sources

simple

Markov

stationary

ergodic

information capacity

source coding theorem

coding optimal / compressionchannel capacityshannon’s fundamental theoremerror correction/detection

noise



A X Y B

Next stop …

Theory of Discrete Information and

Communication Systems

weeks 1-6

Communications Systems Performance

(Mark Beach)

Communications Systems Performance

(Mark Beach) Coursework

Languages, Automata and Complexity

(Colin Campbell)

1st order Predicate Logic

(Enza di Tomaso)

EMAT 31520Information Systems (CSE 3, Eng Maths 3,

Knowledge Eng 3)

EMAT 20530 Logic and Information (CSE 2, Eng Maths 2)

EENG 32000Communication Systems

(EE 3, Avionics 3)

EENG M2100Communication Systems (MSc Comms/Sig Proc)

weeks 7-12

Pictorial representation

I(X;Y)

H(X|Y)

H(Y|X)H(Y)

H(X)

(from Volker Kuhn, Bremen)

error correcting - transmitted bits - choose st even parity in each sets - source bitsany two 4-bit codewords differ in at least 3 places

xi

xj

graph

00000

00001 00010 00100 01000 10000

00011 00101 01001 10001 00110 0101010010 1010001100 11000

Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the...

Documents

Transcript of Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the...