Marcílio Cunha_ MC LOG_ Infraestrutura logística__ REC_ 26 08 14
Chapter 6 Entropy and Shannon’s First Theorem. Information Existence: I(p) = log_( 1 / p ) units...
-
Upload
clifford-ramsey -
Category
Documents
-
view
224 -
download
6
Transcript of Chapter 6 Entropy and Shannon’s First Theorem. Information Existence: I(p) = log_( 1 / p ) units...
Information
Existence: I(p) = log_(1/p)
units of information: in base 2 = a bitin base e = a natin base 10 = a Hartley
6.2
A quantitative measure of the amount of information any event represents. I(p) = the amount of information in the occurrence of an event of probability p.
Axioms:A. I(p) ≥ 0 for any event pB. I(p1∙p2) = I(p1) + I(p2) p1 & p2 are independent eventsC. I(p) is a continuous function of p
Cauchy functional equation
source single symbol
Uniqueness: Suppose I (′ p) satisfies the axioms. Since I (′ p) ≥ 0, take any 0 < p0 < 1, any base k = (1/p0)(1/I′(p0)). So kI′(p0) = 1/p0, and hence logk (1/p0) = I′(p0). Now, any z (0,1) can be written as p0
r, r a real number R+ (r = logp0 z). The
Cauchy Functional Equation implies that I (′ p0n) = n I (′ p0)
and m Z+, I (′ p01/m) = (1/m) I (′ p0), which gives I (′ p0
n/m) = (n/m) I (′ p0), and hence by continuity I (′ p0
r) = r I (′ p0). Hence I (′ z) = r log∙ k (1/p0) = logk (1/p0
r) = logk (1/z).
Note: In this proof, we introduce an arbitrary p0, show how any z relates to it, and then eliminate the dependency on that particular p0.
6.2
Entropy
The average amount of information received on a per symbol basis from a source S = {s1, …, sq} of symbols, si has probability pi. It is measuring the information rate. In radix r, when all the probabilities are independent:
mean geometric weighted
theofn informatio
11
ninformatio ofmean arithmetic weighted
1
1log
1log
1log)(
q
i
p
ir
q
i
p
ir
q
i irir
ii
ppppSH
• Entropy is amount of information in probability distribution.Alternative approach: consider a long message of N symbols
from S = {s1, …, sq} with probabilities p1, …, pq. You expect si to appear Npi times, and the probability of this typical message is:
6.3
)(1
log1
log isn informatio whose 11
SHNp
pNP
pPq
i ii
q
i
Npi
i
Consider f(p) = p ln (1/p): (works for any base, not just e)
f′(p) = (-p ln p)′ = -p(1/p) – ln p = -1 + ln (1/p)
f″(p) = p(-p-2) = - 1/p < 0 for p (0,1) f is concave down
0lim)(
)ln(lim
lnlim
lnlim)(lim
2
1
010101
1
00
pp
pp
pp
pfppp
p
p
pp
f(1) = 0
1/e
f
0 1/e 1p
6.3
f′(0) = ∞f′(1) = -1
f′(1/e) = 0 f(1/e) = 1/e
Basic information about logarithm function
Tangent line to y = ln x at x = 1(y ln 1) = (ln)′x=1(x 1)
y = x 1
(ln x)″ = (1/x)′ = -(1/x2) < 0 x ln x is concave down.
Conclusion: ln x x 1
0
-1
1
ln x
x
y = x 1
6.4
• Minimum Entropy occurs when one pi = 1 and all others are 0.• Maximum Entropy occurs when? Consider
011)()1(log
consider and ons,distributiy probabilit twobe 1 and 1Let
1111
only when
1
11
q
ii
q
ii
q
iii
q
i i
ii
yxq
i i
ii
q
ii
q
ii
yxyxx
yx
x
yx
yx
ii
0
1
loglog1
loglog)(
1on distributi
withGibbs
1
1
11
qy
q
i ii
q
ii
q
i ii
i
pqppq
ppqSH
6.4
Fundamental Gibbs inequality
• Hence H(S) ≤ log q, and equality occurs only when pi = 1/q.
Entropy Examples
S = {s1} p1 = 1 H(S) = 0 (no information)
S = {s1,s2} p1 = p2 = ½ H2(S) = 1 (1 bit per symbol)
S = {s1, …, sr} p1 = … = pr = 1/r Hr(S) = 1 but H2(S) = log2r.
• Run length coding (for instance, in binary predictive coding):
p = 1 q is probability of a 0. H2(S) = p log2(1/p) + q log2(1/q)
As q 0 the term q log2(1/q) dominates (compare slopes). C.f.
average run length = 1/q and average # of bits needed = log2(1/q).
So q log2(1/q) = avg. amount of information per bit of original code.
Entropy as a Lower Bound for Average Code Length
Given an instantaneous code with length li in radix r, let
.)( hence and ,0log,1 Since .log
)log(log1
log1
log)(
1log
1loglog applying ,0log Gibbs,by So
1;;11
1
111
1
11
LSHKKlpK
rlKpQ
pp
pSH
Qpp
Q
p
Qp
QK
rQ
rK
rr
q
iiir
q
iriri
q
i iri
q
i irir
iii
iq
i i
iri
q
ii
l
i
q
il
i
i
By the McMillan inequality, this hold for all uniquely decodable codes. Equality occurs when K = 1 (the decoding tree is complete) and il
i rp
6.5
Shannon-Fano Coding
Simplest variable length method. Less efficient than Huffman, but allows one to code symbol si with length li directly from probability pi.
li = logr(1/pi)
.1
11
log1
logr
prp
p
rr
ppl
pil
ii
l
iiri
ir
ii
Summing this inequality over i:rr
p
K
rpq
i
iq
i
lq
ii
i1
1111
Kraft inequality is satisfied, therefore there is an instantaneous code with these lengths.
6.6
i
r
q
iii
q
i irir
p
SH
L
lpp
pSH
by multiplied summingby
1)(1
log)( Also,11
Example: p’s: ¼, ¼, ⅛, ⅛, ⅛, ⅛ l’s: 2, 2, 3, 3, 3, 3 K = 1
H2(S) = 2.5 L = 5/20
0 0
0 0
1
1 1
1 1
6.6
Recall: The nth extension of a source S = {s1, …, sq} with probabilities p1, …, pq is the set of symbols
T = Sn = {si1 ∙∙∙ sin
: sij S 1 j n} where
ti = si1 ∙∙∙ sin
has probability pi1 ∙∙∙ pin
= Qi assuming independent
probabilities. Let i = (i1−1, …, in−1)q + 1, an n-digit number base q.
The entropy is: []
concatenation
The Entropy of Code Extensions
multiplication
.1
log1
log1
log1
log
1log
1log)()(
111
11
11
1
n
n
nn
n
n
n
n
q
i ii
q
i ii
q
i iii
q
i iii
q
i ii
n
pQ
pQ
ppQ
ppQ
QQTHSH
6.8
1. gives up all themadding and extension,st )1( in they probabilit ajust is ˆ
)()(ˆ
1logˆ1
log
1log
1log th term heConsider t
1
1
1
1
1
1
1
1
1
ˆ
1
1 11
ˆ
11
11
nppp
SHSHppp
ppppp
ppp
ppp
pQk
nk
n
n
k
n k k
knk
n k
n
n
k
n
n
k
iii
i
q
iiiki
q
i
q
i
q
i iiiii
q
iki
q
i iii
q
i
q
i iii
q
i ii
6.8
H(Sn) = n∙H(S)Hence the average S-F code length Ln for T satisfies:H(T) Ln < H(T) + 1 n ∙ H(S) Ln < n ∙ H(S) + 1
H(S) (Ln/n) < H(S) + 1/n [now let n go to infinity]
Extension Example
S = {s1, s2} H2(S) = (2/3)log2(3/2) + (1/3)log2(3/1)
p1 = 2/3 p2 = 1/3 ~ 0.9182958 …
Huffman: s1 = 0 s2 = 1 Avg. coded length = (2/3)∙1+(1/3)∙1 = 1
Shannon-Fano: l1 = 1 l2 = 2 Avg. length = (2/3)∙1+(1/3)∙2 = 4/3
2nd extension: p11 = 4/9 p12 = 2/9 = p21 p22 = 1/9 S-F:
l11 = log2 (9/4) = 2 l12 = l21 = log2 (9/2) = 3 l22 = log2 (9/1) = 4
LSF(2) = avg. coded length = (4/9)∙2+(2/9)∙3∙2+(1/9)∙4 = 24/9 = 2.666…
Sn = (s1 + s2)n, probabilities are corresponding terms in (p1 + p2)n
n
i
ini ppi
n
021 n
iini
i
n
3
2
3
1
3
2y probabilit with symbols are thereSo
6.9
inini
n
3log3log
2
3log islength SF ingcorrespond The 222
323log223log
3
1
3log23
13log
3
2
200
2
02
02
)(
nnii
n
i
nn
ini
nin
i
nL
n
i
in
i
in
n
i
in
n
in
in
SF
(2 + 1)n = 3n
)(3
23log Hence 22
as)(
SHn
Ln
nSF
1
000
1
0
11
0
11
0
332223)(2
3)(2)2(2)2(
nn
n
i
in
i
in
i
inn
i
i
nxn
i
inindxn
i
inin
nnii
ni
i
n
i
nnnin
i
n
nxini
nxnx
i
nx
6.9
2n 3n-1 *
Extension cont.
Markov Process Entropy
Hence,
.,, state theletting of think process,order th an For
. follows y that probabilit lconditiona )|(
1
11
m
mm
ii
iiiiii
sssm
ssssssp
Ssii
ii
i
ssIsspsSH
sspssI
)|()|()|(
so and,)|(
1log)|(
1,)|(
1log),()|(),(
)|()|()()|()|()(
)|()()(Then . statein being ofy probabilit the)(let Now,
mi
im
i
mii
m
m
Ssssspi
Ss Ssii
Ss Ssii
Ssii
Ss
Ss
sspssIssp
ssIsspspssIsspsp
sSHspSHssp
6.10
Example
Si1Si2
Si p(si | si1, si2
) p(si1, si2
) p(si1, si2
, si)
0 0 0 0.8 5/14 4/14
0 0 1 0.2 5/14 1/14
0 1 0 0.5 2/14 1/14
0 1 1 0.5 2/14 1/14
1 0 0 0.5 2/14 1/14
1 0 1 0.5 2/14 1/14
1 1 0 0.2 5/14 1/14
1 1 1 0.8 5/14 4/14
3
21
21
}1,0{22 ),|(
1log),,()(
iiiiii sssp
ssspSH
801377.05.0
1log
14
14
2.0
1log
14
12
8.0
1log
14
42 222
6.11
0, 0
1, 00, 1
1, 1
.8
.8
.5
.5
.5
.5
.2
.2
equilibrium probabilities: p(0,0) = 5/14 = p(1,1) p(0,1) = 2/14 = p(1,0)
previousstate
nextstate
The Fibonacci numbers
Let f0 = 1 f1 = 2 f2 = 3 f3 = 5 f4 = 8 , …. be defined by fn+1 = fn + fn−1. The = the golden ratio, a root of the equation x2 = x + 1. Use these as the weights for a system of number representation with digits 0 and 1, without adjacent 1’s (because (100)phi = (11)phi).
Base Fibonacci
Representation Theorem: every number from 0 to fn − 1 can be uniquely written as an n-bit number with no adjacent one’s .
Existence: Basis: n = 0 0 ≤ i ≤ 0. 0 = (0)phi = ε
Induction: Let 0 ≤ i ≤ fn+1 If i < fn , we are done by induction hypothesis. Otherwise, fn ≤ i < fn+1 = fn−1 + fn , so 0 ≤ (i − fn) < fn−1, and is uniquely representable by i − fn = (bn−2 … b0)phi with bi in {0, 1} ¬(bi = bi+1 = 1). Hence i = (10bn−2 … b0)phi which also has no adjacent ones.
Uniqueness: Let i be the smallest number ≥ 0 with two distinct representations (no leading zeros). i = (bn−1 … b0)phi = (b′n−1 … b′0)phi . By minimality of i bn−1 ≠ b′n−1 , and so without loss of generality, let bn−1 = 1 b′n−1 = 0, implies (b′n−2 … b′0)phi ≥ fn−1 which can’t be true.
The golden ratio = (1+√5)/2 is a solution to x2 − x − 1 = 0 and is equal to the limit of the ratio of adjacent Fibonacci numbers.
0…
r − 1H2 = log2 r
1/r
0 11/
1/2
1
0
1st order Markov process:
010
1/
1/2
1/ 1/2
1 01/ + 1/2 = 1
Think of source as emitting variable length symbols:
Entropy = (1/)∙log + ½(1/²)∙log ² = log which is maximal
take into account variable length symbols
1/
1/2 0
Base Fibonacci