Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the...

29
Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen bit from the encoded message is 1? a 1/2 0 b 1/4 10 c 1/8 110 d 1/8 111 P 1 () = P 1 s i ( ) × Ps i () i × l i Ps i () i × l i = 0×12×1+ 12 ×14×2+ 2 3 ×18×3+ 33 ×18×3 12×1+14×2+18×3+18×3 = 78 74 =12 = (expected number of 1s/expected number of

Transcript of Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the...

Page 1: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Another question

• consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown

• what is the probability that a randomly chosen bit from the encoded message is 1?

a 1/2 0

b 1/4 10

c 1/8 110

d 1/8 111

P 1( ) =

P 1si( ) × P si( )i

∑ × li

P si( )i

∑ × li

=0 ×1 2 ×1+ 1 2 ×1 4 × 2 + 2 3×1 8 × 3 + 3 3×1 8 × 3

1 2 ×1 + 1 4 × 2 + 1 8 × 3 + 1 8 × 3

=7 8

7 4=1 2

= (expected number of 1s/expected number of bits)

Page 2: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Another question

• consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown

• what is the probability that a randomly chosen bit from the encoded message is 1?

a 1/2 0

b 1/4 10

c 1/8 110

d 1/8 111

Page 3: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Shannon-Fano theorem

• Channel capacity– Entropy (bits/sec) of encoder determined by entropy of source (bits/sym)– If we increase the rate at which source generates information (bits/sym) eventually we

will reach the limit of the encoder (bits/sec). At this point the encoder’s entropy will have reached a limit

• This is the channel capacity• S-F theorem

– Source has entropy H bits/symbol– Channel has capacity C bits/sec– Possible to encode the source so that its symbols can be transmitted at up to C/H

symbols per second, but no faster– (general proof in notes)

source encode/transmit

receive/decode

destinationchannel

Page 4: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Conditional Entropy (lecture 3)

• conditional entropy of A given B=bk is the entropy of the probability distribution Pr(A|B=bk)

• the conditional entropy of A given B is the average of this quantity over all bk

H A B = bk( ) = P ai bk( )i

m

∑ log1

P ai bk( )

H A B( ) = P b j( )j

n

∑ P ai b j( )i

m

∑ log1

P ai b j( )

⎜ ⎜

⎟ ⎟

= P ai,b j( )i, j

∑ log1

P ai b j( )the average uncertainty about A when B is known

Page 5: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Mutual information (lecture 3)

H(A) – H(A|B) = H(B) – H(B|A)

= H(B) + H(A|B) = H(A) +H(B|A)

H(B , A ) = H(A , B)

Rearrange:

I(A ; B) I(B ; A)

I(A ; B) = information about A contained in B

H(A,B)

H(A)

H(B)

H(A|B) H(B|A)I(A;B)

Page 6: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Mutual information - example

A

B

0 with probability p1 with probability 1-p

0 with probability q1 with probability 1-q

C c= a+b mod 2

if p=q = 0.5, (i) what is the probability that c = 0 ?(ii) what is I(C;A) ?What if p= 0.5 and q = 0.1 ?What about the general case, any p, q ?

transmit (A) receive (C)

noise (B)

Page 7: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

General case

a=0 with probability p - in this case, Pr(c=0) = q, Pr(c=1) = 1-q

a=1 with probability 1-p - in this case, Pr(c=0) = 1-q, Pr(c=1) = q

average uncertainty about C given A =

q log 1 q( ) + 1− q( )log 1 1− q( )( )

= H B( )

I(A;C) = H(C) - H(B)

H(A|C)

H(C)

H(A)

H(C|A) = H(B)

Page 8: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Discrete Channel with Noise

noise

channelsourceencode/transmit

receive/decode dest.

A X Y B

A

equivocation = H(X | Y )

transmission rate = H(X ) H(X | Y )

channel capacity = max (transmission rate)

Page 9: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Noisy Channels

• A noisy channel consists of an input alphabet X, an output alphabet Y and a set of conditional distributions Pr(y|x) for each y Y and x X

binary symmetric channel0

1

0

1x y

P y = 0 x = 0( ) =1− f P y = 0 x =1( ) = f

P y =1x = 0( ) = f P y =1x =1( ) =1− f

Page 10: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Inferring input from output0

1

0

1x y

P y = 0 x = 0( ) = 0.85 P y = 0 x =1( ) = 0.15

P y =1x = 0( ) = 0.15 P y =1x =1( ) = 0.85

error probability = 0.15source distribution P(0)=0.9

observe y=1Use Bayes

P x =1y =1( ) =P y =1x =1( )P x =1( )

P y ′ x ( )P ′ x ( )′ x

=0.85 ×1

0.85 × 0.1+ 0.15 × 0.9= 0.39

x=0 is still more probable than x=1

P x y( ) =P y x( ) × P x( )

P y( )

Page 11: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Other useful models

binary erasure channel0

1

0

1

x y

P y = 0 x = 0( ) =1− f P y = 0 x =1( ) = 0

P y = ? x = 0( ) = f P y = ? x =1( ) = f

P y =1x = 0( ) = 0 P y =1x =1( ) =1− f

Z channel0

1

0

1x y

P y = 0 x = 0( ) =1 P y = 0 x =1( ) = f

P y =1x = 0( ) = 0 P y =1x =1( ) =1− f

?

Page 12: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Information conveyed by a channel

• input distribution P(x), output distribution P(y)

• mutual information I(X;Y)

• what is the distribution P(x) that maximises I

(channel capacity)

I(X;Y)

H(X|Y)

H(Y|X)H(Y)

H(X)

(also depends on error matrix)

Page 13: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Shannon’s Fundamental Theorem

• Consider source with entropy R and a channel with

capacity C such that R < C. There is a way of

(block) coding the source so that it can be

transmitted with arbitrarily small error

• group input symbols together (block code)

• use spare capacity for error correcting code

(Hamming code etc.)

Page 14: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Example - noisy dice

1

2

3

4

5

6

1

2

3

4

5

6

imagine restricting the input symbols to 2 and 5This is a non-confusable subset - - for any output, we would know the input

(similarly (1, 4) or {3, 6} )

Page 15: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Outline of proof• consider sequence of signals of length N

– as N increases, probability of error reduces

– (“typical” outputs are unlikely to overlap)

– as N , Pr(error) 0

– e.g. binary symmetric channel, f=0.15

repeat signal N times

(figures from Mackay, “Information Theory, Inference, and Learning Algorithms”, CUP)

Page 16: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Outline

• consider long time T, sequence length N– 2NH(X) typical source sequences, each occurs with probability 2-H(X)

– a typical received signal y corresponds to 2N(X|Y) possible inputs– choose 2NR random input sequences to represent our source messages– consider transmitting xi - if it is corrupted, it may be decoded as xj

where j≠i– if y is received, it corresponds to a set of inputs Sy

for j ≠ i, P x j ∈ Sy( ) ≤ P xk ∈ Sy( )k≠ i

2NR

where P xk ∈ Sy( ) =2NH X Y( )

2NH X( )=

1

2NC

so P x j ∈ Sy( ) ≤ 2N R−C( ) since R < C we can make this as small as we like by choosing large N

Page 17: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.
Page 18: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Error Detection / Correction

noise

channelsourceencode/transmit

receive/decode dest.

A X Y B

Errorcode

Errordetect

Errorcorrect

resend

• Error-detecting code– Detect if one or more digits

has been changed

– Cannot say which digits have changed

– E.g. parity check

• Error-correcting code– Error-detection as on left

– Can also work out which digits have been changed

– E.g. Hamming code

Page 19: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Error detection• If code words are very similar then it is difficult to detect

errors– e.g. 1010110100 and 1010110101

• If code words are very different then easier to detect errors– e.g. 1111111111 and 0000000000

• Therefore more different code words the better– Measure using Hamming Distance, d

• number of different digits • e.g. 011011 and 010101

• differ in 3 places, therefore d = 3

0 1 1 0 1 1

0 1 0 1 0 1

Differ ?

Page 20: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Hamming distance

• Measure of ‘distance’ between words• Choose nearest code word e.g.

a = 10000b = 01100c = 10011

• Use d to predict number of errors we can detect/correcte.g. parity check

10000 sent

10001 rec. 11000 rec.

d 2e+1 Can correct up to e errors per word

d = 2e Can correct up to e-1 errors per word, can detect e errors

d e+1 Can detect up to e errors per word

Page 21: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Error correction

• Like error-detecting code, but need more bits (obvious really!).

• More efficient when larger code words are being used• Overhead of coding/decoding arithmetic• Hamming code

– D = number of data bits– P = number of parity bits– C = D + P = code word length– Hamming inequality for single error correction:

D + P + 1 2P – If P is small – hardly worth doing – cheaper to re-send code word– If P 3 some increase in transmission rate is possible

Page 22: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Hamming code

• Process:– Coding

• Take code word from encoder (before adding parity bits) and multiply by generator matrix G using modulo-2 arithmetic

• This gives the code word– (d1, … , dD, p1, … , pP)

– Decoding• Take received code word (D+P bits) and multiply by decoder

matrix X using modulo-2 arithmetic• This gives us a syndrome (or parity) vector s• If s contains all zeros then no errors• Vector s is matched against X to find position of single error.

Page 23: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Example

• E.g. d = 4, p = 3G = [ I | A ]

• Encode 1001

• X = [AT | I]

• Receive 1101001 & decode

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

==

0

1

1

1

1

0

1

1

1

1

0

1

1000

0100

0010

0001

]A|I[G

⎥⎥⎥

⎢⎢⎢

==100

010

001

0

1

1

1

0

1

1

1

0

1

1

1

]I|A[X T

sendCW]1001001[

0

1

1

1

1

0

1

1

1

1

0

1

1000

0100

0010

0001

]1001[GCW ==

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

×=×

s

1

1

0

1

0

01

0

1

1

100

010

001

0

1

1

1

0

1

1

1

0

1

1

1

CWX receiced =⎥⎥⎥

⎢⎢⎢

⎡=

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢

×⎥⎥⎥

⎢⎢⎢

Error in bit 2

Page 24: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Summary

probability

joint

conditional

bayes

entropy

decomposition

conditional

mututal information

sources

simple

Markov

stationary

ergodic

information capacity

source coding theorem

coding optimal / compressionchannel capacityshannon’s fundamental theoremerror correction/detection

noise

channelsourceencode/transmit

receive/decode dest.

A X Y B

Page 25: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Next stop …

Theory of Discrete Information and

Communication Systems

weeks 1-6

Communications Systems Performance

(Mark Beach)

Communications Systems Performance

(Mark Beach) Coursework

Languages, Automata and Complexity

(Colin Campbell)

1st order Predicate Logic

(Enza di Tomaso)

EMAT 31520Information Systems (CSE 3, Eng Maths 3,

Knowledge Eng 3)

EMAT 20530 Logic and Information (CSE 2, Eng Maths 2)

EENG 32000Communication Systems

(EE 3, Avionics 3)

EENG M2100Communication Systems (MSc Comms/Sig Proc)

weeks 7-12

Page 26: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

Pictorial representation

I(X;Y)

H(X|Y)

H(Y|X)H(Y)

H(X)

(from Volker Kuhn, Bremen)

Page 27: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

error correcting - transmitted bits - choose st even parity in each sets - source bitsany two 4-bit codewords differ in at least 3 places

Page 28: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

xi

xj

Page 29: Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

graph

00000

00001 00010 00100 01000 10000

00011 00101 01001 10001 00110 0101010010 1010001100 11000