ENTROPY
(Notes by Glenn. Mertens)
ENTROPYTHE BASICS OF INFORMATION THEORY
Shannon 's theory from 1948.
Shannon 's view
Lower bounds for compression
Entropy E = Z.fi leg,Yp
.
.
The global view
Back to prefix codes.
. hempel - Ziv compression .
Shannon 's view
IFEE -oe→Bt7EF → REFER
reproofed as a ( lossless compassion)path in atria
①
Ebbntght !10
HUGE -07
.
Expected length of compressed file-
Ewa: gameD D D D probability of seeing
files filet files file 4 input file "i
".
So,
the last compression method, given
the fi 's,
is the
Huffman code.
BUT . . . .The tree is too large !
The pi 's are often not known precisely .
Nevertheless,
we know a lot about the best compression method :
Shannon 's theorem.
E E ruin !.
.
Pili s E t I.
all binarytrees
E = Epi legftp.. = (binary) entropy.
G- o )
KRAFT 's INEQUALITY
Let Li be the depths of the leaves in a binary tree.
Then
? # E I.
Frodo (By induction : exercise ? )°
← →" " .
i÷!÷!÷÷÷÷÷÷.
PROOF Of SHANNON 'S THEOREM
(LOWER BOUND ) E
Epik . -
- Epi lggeli-ER.bg,
Ghi: )tEpiloja= Zpilofageip.
t E
> -
ZK.CZ?p.-D#tE=
-
- + E
Read:
a
'
atiYig÷÷÷→I
( UPPER BOUND )
By converse of Kraft i If ?Yzli ee,
then there exists
a binary tree with leaves at these depths ( enna 're ).
so, given Pi ,At
e , = togElp?
.
Then I € -2¥ ,p..
= Ipi =L.
fo,
we can use these
Li to make a tree.
Let that tree define the code.
The
expected length is 9- called theShannon - Fano code
-2 pili = Epitopes Epi leg ftp.t Epi = Et I.
So,
there is a code with expected length ⇐ Et I.
The global viewunrealistic
If the input consists of an independent symbols froman alphabet A
,with symbol probabilities pi ,
then each symbol can be coded via Hoffman ,
and we have a total expected length
← M ×③¥ntqyof one symbolLower bound : 3 M * E
(entropy of file = sum of outagesSo
. . . . of the symbols)
It help togroup
the symbols in
groups of⑧ ,and
Huffman - code eachgot . T
a small number
01 : One could use Lampert - Ziv compression Csa later )
Solution I : Hoffman coding on gaps of characters
be: Group "letters in set . of s :Caleb )
,Cbca)
,.
. .
Get pi by countingoccurrences in a file
Construct the Huffman code .
Code t decode as for prefix codes
RECALL
ofI
toStart at root and detect a leaf .
Decode.
Repeat .
Coin pressed 00001 I I I 01010101sequence
in -
Decoded i ta ta III ! IFTime to decode = Length of compressed sequence .
compressed sequences-The decoder :
Given the compressed sequence s
- de
1£ t
Given the binary tree with the code ; it.
root is t ;its leaves contain symbols of the alphabet A
.
seat (traveling pointer in t )while Isl > o :
G a- get neat Git from s
if 8=0 then a - left CoDelse x ← right Cx ]
if a is a leaf then output hey In ]Rot
LEMPEL - Ziv COMPRESSION (taupe and Ziv,
1977 ) (Solution I )
Beagles:
zip , jpeg ,
most compression methods.
Feature: Undergenerous input file assumptions ,
the expectedlength of the compressed sequence
is close to E.
Method: Parse inept in smallest pieces never seen before
INIVT pie,q#60aa ab ale ab e c 6 G Gaa 6 a⑥ aaa G a ac
PIECE # O I 2 3 4 5 6 7 8 9 10 11 12
tta ! ! !: dictate data.
take He
painfulFastsymbol of piece
to front piece
THE BINARY SEQUENCE .
Fr k - th piece ,we have
lT¥a symbol from the← alphabetan integer in too . . .sk - ⇒
l needs tofalktI bit.
piece # I 2 3 4 56 7 8 9 10 in i 7 12
Tof O 1 2 233 334 4 4 4 ,- -
ii.Tx I
x needs a fixed # of bits : TegelAT.
In output ,all bits axe clearly identified .
DATA STRUCTURE
FOR CODING / DECODING .
:THE DIGITAL SEARCH TREE :
"
IAI -
anytrie "
INPUT
0aaaba babe ebb Gaa Ga
⑥aaaeoaac
PIECE # O I 2 3 4 5 6 7 8 9 10 11 12
ttoa.t.at. .! dictate !a data !. Ieago
10
÷E÷¥÷÷÷÷¥:*.÷÷÷ .
In theparsing phase :
start at root and descend to a leafA add a symbol Cand new leaf )
to add a piece .
Exercise: If inept is of size n
,then write the
parsing algorithm that produces
andG) the tree
G) the sequence ( Oa ) ha ) lob ) . - -
and show that it takes time On ).
In the decoding phase : keep a table of pointers .
piece #
%athenEta:impatient3 o 6 withpeanutpointers
4 I G - -
-
5 4 C
060 c
7 3 6€34aato 6 a
LI 2 812 a c
Decodepiece to as ⑥a) → ( o e a) → (01 c a)
Exercise: Writean Oln ) algorithm for decoding .
Top Related