Post on 11-Oct-2015
description
Unit-I: Information Theory & Source coding-
Information Theory
Evaluation of performance digital communication
system
Efficiency-to represent information of a given
source
Rate of transmission of information reliably-
over a noisy channel
Given an information source & a noisy channel,
information theory provides limits on ---
1) minimum number of bits per symbol required
to fully represent the source (Source Encoding)
2)the maximum rate at which reliable
communication can take place over the noisy
channel (Channel Encoding)
Uncertanity, Surprise, Information
Discrete Memoryless Source (DMS)
-output emitted per unit of time, successive outcomes are
independent & identically distributed.
Source output-modelled as a discrete r.v. S
S={ s0, s1, ---- sk-1}
With probabilities
p(S = sk)=Pk ; k=0,1,..K-1 &
Condition
Source output symbols are statistically independent
=
1
1K
ok
kP
When symbol sk is emitted
-message from the source comes out
- probability Pk=1
- No surprise : certanity
-No information
When source symbols occur with different probabilities, i.e. probability Pk is low
-more surprise-:uncertanity
-more information
Amount of information = 1/(probability of occurance)
more Unexpected/Uncertain an event is, the more information is obtained
Examples:
Information/ self information: Defination
K message symbols of DMS as m1, m2---mK with
probability P1,P2,----PK
Amount of Information transmitted = IK
Logarithmic base 2- unit bits
e- nats
10- decit/ Hartelys
bitsPP
I KK
k ),(log)1
(log 22 ==
Why logarithmic relation K message symbols of DMS as m1,m2---mK Each symbol is equiprobable & statistically independent
Transmission of symbol-carries information (I)
All symbols are equiprobable- carries same information I depends on K in some way as I = f(K), f is to be determined
Second symbol in succeeding interval, another quantity of information I
Information due to both I + I = 2I
If there are K alternatives in one interval, there are K2 pairs in both intervals
2I=f (K2 ) In general for m intervals
m I =f (Km)
Simplest function to satisfy is log. f(K)=A log K , A constant of proportionality
I = log K, All K symbols are equiprobable, PK=1/K, K=1/PK I=log ( 1/PK)
Properties of information
1)If Pk=1,
-receiver being certain @ message being transmitted
-Information-ZERO
2) More uncertain/less probable message-carries more information
3) I1-information by message m1 with probability P1 &
I2 -information by message m2 with probability P2and m1 & m2 statistically independent ;
I(m1 ,m2)= I(m1) + I(m2)
4) M=2N : equally likely & independent messages
-Information N bits
Proof: probability of each message= (1/M)=PKInformation Ik= bitsNNMP
N
K
;)2(log)2(log)(log)1
(log 2222 ====
Entropy H -a measure of information
In communication system-all possible messages
are considered
DMS with source alphabet S having M different
messages / symbols
Described by avg. information per source symbol
Entropy H
--depends only on the probability of symbol
Proof:-M different messages/symbols- m1,m2,.mM with probabilities P1,P2,..PM
Sequence of L messages is transmitted & L>>M
=
=M
k k
K symbolbitsP
P1
2 /);1
(log
Properties of Entropy
1. H=0, if PK=0 or 1
2. For M equiprobable symbols, H=log2M
Proof: for equiprobable symbols P1=P2=---=PM=1/M
)(log
)}(log1
{
)(log1
)(log1
)(log1
)1
(log)1
(log)1
(log
)1
(log
2
2
222
2
2
22
1
21
1
2
MH
MM
timesM
MM
MM
MM
PP
PP
PP
PPH
M
M
M
k k
K
=
=
+++=
+++=
==
3. Entropy is bounded with upper bound=log2(M)
i.e. 0 H log2(M)
Proof:
4 A source transmits two independent messages
with probabilities of P & (1-P). Prove that the entropy
is maximum when both the messages are equally
likelly
Proof:
Information Rate (R)
For a source r symbol rate/message rate
Unit messages/sec
H=avg. number of bits per message
R = rH, bits/sec
Examples
Extension of a DMS/ Discrete zero memory
source For a block of symbols-each consisting of n
successive source symbols
Such source extended source
Source alphabets Sn
Distinct blocks Kn, k=number of distinct
symbols in alphabet
H(Sn)=n H(S)
Example: consider the second order extended DMS
with the source alphabet S consisting of three
symbols so, S1, S2 with probability of occurance as
P0= , P1= , P2= . Calculate entropy of the
extended source.
H(Sn)= H(S2)=2. H(s)---1)
H(s)=calculate= 3/2 bits ---2)
Putting in 1
H(Sn)=3 bits
Source Coding
classification
Source encoder fundamental requirements
a) codewordsbinary forms
b) source codes uniquely decodable
Properties of source encoder
1) No additional information
2) Dosent destroys information content
3) reduces the fluctuations in the Information Rate
4) avoids symbol surges
Source Coding Theorem-Shannons First Theorem
Theorem
Given a DMS of Entropy H(S), the average
codeword length for any source encoding is
bounded as
H(S) fundamental limit on the avg. number of
bits/symbol ( ) for representation of DMS =
L
)"(SHL
L minL
L
SHcode
)(=
Properties of source codes
A) Uniquely Decodable: Single possible meaning
B) A prefix code: no complete codeword is the prefix of any other codeword
Source decoder starting from the sequence, decoder decodes one codeword at a time
Uses decision tree- Initial state, terminal state
Decode the received sequence 010110111
a) Fixed length code:- fixed codeword length
-- code 1 & 2
b) variable length code:-variable codeword length
-- code 3 to 6
c) Distinct code:- each codeword is distinguishable from other
-- all codes except 1
d) Prefix free codes: complete codeword is not
the prefix of any other codeword
-- code 2, 4, 6
e) Uniquely decodable codes: code 2, 4, 6
f) Instanteneous codes: code 2, 4, 6
g) Optimum codes: Instanteneous, minimum
avg. length L
Kraft-McMillan Inequality Criteria
If DMS forms prefix code, source alphabet {S0,
S1,Sk-1}, source statics {P0, P1,--Pk-1)
Codeword for symbol Sk-length lk
Codeword lengths of all the code satisfy certain
Inequality Kraft McMillan Inequality &
If the codeword lengths of a code for a DMS
satisfy the Kraft-McMillan Inequality , then a prefix
code with these codeword length can be
constructed.
=
1
0
12k
k
lk
*Given a DMS of Entropy H(S), the average
codeword length of a prefix code is bounded
as H(S) < [H(S)+1]
* when =H(S) Prefix code matches to
DMS
L
L
Extended Prefix Code To match an arbitrary DMS with prefix code
For nth extension of code, a source encoder
operates on block of n samples, rather than
individual samples
- -avg. codeword length of the extended prefix
code
For n,
nL
]1
)([)(
]1)(.[)(
]1)([)(
nsH
n
LSH
SHnLSHn
SHLSH
n
n
n
n
n
+
making order n of a source-large enough
-DMS can be represented faithfully
The avg. codeword length of an extended prefix code can be made as small as entropy of the source provided the extended code has a high enough order in accordance with Shannons Source Coding Theorem
Huffman Code- Variable Length Source code For Huffman code: fundamental limit
Huffman code Optimum Code
No other Uniquely decodable set of codewords
smaller avg. codeword length for the given DMS
Algorithm
1. List the given source symbols by the order of
decreasing probability.
- assign 0 & 1 for the last two source symbols
- Splitting stage
2. These last two source symbols in the sequence are
combined to form a new source symbol with
probability equal to the sum of the two original
probabilities.
)(SHL
-The probability of new symbol is placed in the list
in accordance with its value
-As list of source symbols is reduced by one
reduction stage
3. Repeat procedure until list contents a final set of
source statics of only two for which a 0 & a 1 are
assigned and it is an optimum code
4. starting from the last code, work backward to
form an optimum code for the given symbol
Example 1.Consider a DMS with source alphabet S
with symbols S0, S1, S2, S3, S4 with probabilities of
0.4, 0.2, 0.2, 0.1, 0.1 respectively. Form the source
code using Huffman Algorithm & verify the
Shannons first theorem of source coding
Symbol
S0S1S2S3S4
Probability
0.4
0.2
0.2
0.1
0.1
Step-I
Splitting Stage
Step-II
0.4
0.2
0.2
0.2
Reduction
Step-III
0.4
0.4
0.2
Step-IV
0.6
0.4
0.2
0.4
0.6
0
10
1
0
10
1
Ans:
Symbol Prob. Codeword codeword length lk
S0 0.4 00 2
S1 0.2 10 2
S2 0.2 11 2
S3 0.1 010 3
S4 0.1 011 3
lbits/symbo 2.2
31.031.022.022.024.0
4
0
=
++++=
==K
KK lPL
lbits/symbo 13193.2)1
(log)(4
0
2 ===k K
kP
pSH
verifiedis theoremsatsified, )( As SHL
Huffman code- process not unique
Prob. of combined symbol=another probability in the list
place the probability of the new symbol
A) as high as possible or
B) as Low as possible
Variance (2 )
To measure variability in codeword lengths of a source code
As low as possible
=
=1
0
22 )(k
k
kk LlP
Shannon Fano Coding
Principle: The codeword length increases, as the symbol
probability decreases.
Algorithm:-involves succession of divide & conquer steps
1. Divide symbols into two groups
- such that the group probabilities are as nearly equal as
possible
2. Assign the digit 0 to each symbol in the first group & digit
1 to each symbol in the second group
3. For all subsequent steps, subdivide each group into
subgroup & again repeat step 2.
4. Whenever a group contents just one symbol, no further
subdivision is possible & the codeword for that symbol is
complete.
- when all groups are reduced to one symbol, the codewords
are given by the assigned digits, reading from left to right.