Willems is It 2013

download Willems is It 2013

of 102

Transcript of Willems is It 2013

  • 8/18/2019 Willems is It 2013

    1/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Lossless Source Coding Algorithms

    Frans M.J. Willems

    Department of Electrical EngineeringEindhoven University of Technology

    ISIT 2013, Istanbul, Turkey

    http://find/

  • 8/18/2019 Willems is It 2013

    2/102

  • 8/18/2019 Willems is It 2013

    3/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Choosing a Topic

    POSSIBLE TOPICS:Multi-user Information Theory (with Edward van der Meulen (KUL),Andries Hekstra)

    Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),Paul Volf)

    Watermarking, Embedding, and Semantic Coding (with Martin van

    Dijk, Ton Kalker (Philips Research))Biometrics (with Tanya Ignatenko)

    LOSSLESS SOURCE CODING ALGORITHMS

    WHY?

    Not many sessions at ISIT 2012! Is lossless source coding DEAD?Lossless Source Coding is about UNDERSTANDING data. UniversalLossless Source Coding is focussing on FINDING STRUCTURE indata. MDL principle [Rissanen].

    ALGORITHMS are fun (Piet Schalkwijk).

    http://find/

  • 8/18/2019 Willems is It 2013

    4/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Choosing a Topic

    POSSIBLE TOPICS:Multi-user Information Theory (with Edward van der Meulen (KUL),Andries Hekstra)

    Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),Paul Volf)

    Watermarking, Embedding, and Semantic Coding (with Martin van

    Dijk, Ton Kalker (Philips Research))Biometrics (with Tanya Ignatenko)

    LOSSLESS SOURCE CODING ALGORITHMS

    WHY?

    Not many sessions at ISIT 2012! Is lossless source coding DEAD?Lossless Source Coding is about UNDERSTANDING data. UniversalLossless Source Coding is focussing on FINDING STRUCTURE indata. MDL principle [Rissanen].

    ALGORITHMS are fun (Piet Schalkwijk).

    http://find/

  • 8/18/2019 Willems is It 2013

    5/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Choosing a Topic

    POSSIBLE TOPICS:Multi-user Information Theory (with Edward van der Meulen (KUL),Andries Hekstra)

    Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),Paul Volf)

    Watermarking, Embedding, and Semantic Coding (with Martin van

    Dijk, Ton Kalker (Philips Research))Biometrics (with Tanya Ignatenko)

    LOSSLESS SOURCE CODING ALGORITHMS

    WHY?

    Not many sessions at ISIT 2012! Is lossless source coding DEAD?Lossless Source Coding is about UNDERSTANDING data. UniversalLossless Source Coding is focussing on FINDING STRUCTURE indata. MDL principle [Rissanen].

    ALGORITHMS are fun (Piet Schalkwijk).

    http://find/http://goback/

  • 8/18/2019 Willems is It 2013

    6/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Lecture Structure

    TUTORIAL, binary case, my favorite algorithms, ...REMARKS, open problems, ...

    http://find/

  • 8/18/2019 Willems is It 2013

    7/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Outline

    1   INTRODUCTION

    2   HUFFMAN and TUNSTALL

    Binary IID SourcesHuffman CodeTunstall Code

    3   ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code

    4   ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy

    5   CONTEXT-TREE WEIGHTINGIID, unknown  θBinary Tree-Sources

    Context TreesCoding Probabilities

    6   REPETITION TIMESLZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy

    7   CONCLUSION

    http://find/

  • 8/18/2019 Willems is It 2013

    8/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Binary Sources, Sequences, IID

    Binary Sourcex 1x 2 · · · x N 

    The  binary source  produces a  sequence  x N 1   =  x 1x 2 · · · x N   with components∈ {0, 1}  with probability  P (x N 1  ).

    Definition (Binary IID Source)

    For an  independent identically distributed (i.i.d.)   source with parameter

    θ, for 0 ≤ θ ≤ 1,P (x N 1  ) =

    N n=1

    P (x n),

    whereP (1) = θ, and  P (0) = 1 − θ.

    A sequence  x N 1   containing  N  − w  zeros and  w  ones has probabilityP (x N 1  ) = (1 − θ)N −w θw .

    Entropy IID Source

    The  ENTROPY  of this source is  h(θ)   ∆= (1 − θ)log2 11−θ   + θ log2 1θ   (bits).

    http://find/

  • 8/18/2019 Willems is It 2013

    9/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Binary Sources, Sequences, IID

    Binary Sourcex 1x 2 · · · x N 

    The  binary source  produces a  sequence  x N 1   =  x 1x 2 · · · x N   with components∈ {0, 1}  with probability  P (x N 1  ).

    Definition (Binary IID Source)

    For an  independent identically distributed (i.i.d.)   source with parameter

    θ, for 0 ≤ θ ≤ 1,P (x N 1  ) =

    N n=1

    P (x n),

    whereP (1) = θ, and  P (0) = 1 − θ.

    A sequence  x N 1   containing  N  − w  zeros and  w  ones has probabilityP (x N 1  ) = (1 − θ)N −w θw .

    Entropy IID Source

    The  ENTROPY  of this source is  h(θ)   ∆= (1 − θ)log2 11−θ   + θ log2 1θ   (bits).

    http://find/

  • 8/18/2019 Willems is It 2013

    10/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Fixed-to-Variable (FV) Length Codes

    IDEA:

    Give more probable sequences shorter codewords than less probablesequences.

    Definition (FV-Length Code)

    A  FV-length code  assigns to source sequence  x N 1   a  binary codeword  c (x N 1  )

    of length  L(x N 1   ). The  rate  of a FV code is

    R   ∆=

      E [L(X N 1   )]

    N (code-symbols/source-symbol).

    GOAL:

    We would like to find decodable FV-length codes that MINIMIZE this rate.

    http://find/

  • 8/18/2019 Willems is It 2013

    11/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Prefix Codes

    Definition (Prefix code)

    In a  prefix code  no codeword is the prefix of any other codeword.

    We focus on prefix codes. Codewords in a prefix code can be regarded asleaves in a rooted tree. Prefix codes lead to   instantaneous decodability.

    Example

    x N 1   c (x N 1  )   L(x 

    N 1  )

    00 0 101 10 2

    10 110 311 111 3   ∅1

    0

    11

    10

    111

    110

    http://goforward/http://find/http://goback/

  • 8/18/2019 Willems is It 2013

    12/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Prefix-Codes (cont.)

    Theorem (Kraft, 1949)

    (a) The lengths of the codewords in a prefix code satisfy Kraft’s inequality 

    x N 1  ∈X 

    2−L(x N 1   ) ≤ 1.

    (b) For codeword lengths satisfying Kraft’s inequality there exists a prefix code with these lengths.

    This leads to:

    Theorem (Fano, 1961)

    (a) Any prefix code satisfies 

    E [L(X N 1   )] ≥ H (X N 1   ) = Nh(θ),

    or equivalently  R  ≥  h(θ). The minimum is achieved if and only if L(x N 1  ) = − log2(P (x N 1  ))  ( ideal codeword length) for all  x N 1   ∈ X N  withnonzero  P (x N 1  ).(b) There exist prefix codes with

    E [L(X N 1   )] <  H (X N 1  ) + 1 =  Nh(θ) + 1,

    or equivalently  R  

  • 8/18/2019 Willems is It 2013

    13/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ MethodVF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Huffman’s Code

    Definition (Optimal FV-length Code)

    A code that minimizes the expected codeword-length  E [L(X N 1  )] (and therate  R ) is called  optimal.

    Theorem (Huffman, 1952)

    The Huffman construction leads to an optimal FV-length code.CONSTRUCTION:

    Consider the set of probabilities  {P (x N 1  ), x N 1   ∈ X N }.Replace two smallest probabilities by a probability which is their sum.Label the branches from these two smallest probabilities to their sumwith code-symbols “0” and “1”.

    Continue like this until only one probability (equal to 1) is left.

    Obviously Huffman’s code results in  E [L(X N 1   )] <  H (X N 1  ) + 1 = Nh(θ) + 1

    and therefore  R  

  • 8/18/2019 Willems is It 2013

    14/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Huffman’s Code

    Definition (Optimal FV-length Code)

    A code that minimizes the expected codeword-length  E [L(X N 1  )] (and therate  R ) is called  optimal.

    Theorem (Huffman, 1952)

    The Huffman construction leads to an optimal FV-length code.CONSTRUCTION:

    Consider the set of probabilities  {P (x N 1  ), x N 1   ∈ X N }.Replace two smallest probabilities by a probability which is their sum.Label the branches from these two smallest probabilities to their sumwith code-symbols “0” and “1”.

    Continue like this until only one probability (equal to 1) is left.

    Obviously Huffman’s code results in  E [L(X N 1   )] <  H (X N 1  ) + 1 = Nh(θ) + 1

    and therefore  R  

  • 8/18/2019 Willems is It 2013

    15/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Huffman’s Construction

    Example

    Let  N  = 3 and  θ = 0.3, then  h(0.3) = 0.881.

    .343

    .147

    .147

    .063

    .147

    .063

    .063

    .027

    .0901

    0

    .126

    1

    0

    .2161

    0

    .2941

    0   .3630

    1

    .6371

    0

    1.001

    0

    Now E [L(X N 1  )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+

    2(.147 + .343) = 2.726. Therefore  R  = 2.726/3 = 0.909.

    http://find/

  • 8/18/2019 Willems is It 2013

    16/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Huffman’s Construction

    Example

    Let  N  = 3 and  θ = 0.3, then  h(0.3) = 0.881.

    .343

    .147

    .147

    .063

    .147

    .063

    .063

    .027

    .0901

    0

    .126

    1

    0.216

    1

    0

    .2941

    0   .3630

    1

    .6371

    0

    1.001

    0

    Now E [L(X N 1  )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+

    2(.147 + .343) = 2.726. Therefore  R  = 2.726/3 = 0.909.

    http://find/

  • 8/18/2019 Willems is It 2013

    17/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Huffman’s Construction

    Example

    Let  N  = 3 and  θ = 0.3, then  h(0.3) = 0.881.

    .343

    .147

    .147

    .063

    .147

    .063

    .063

    .027

    .0901

    0

    .126

    1

    0.216

    1

    0

    .2941

    0   .3630

    1

    .6371

    0

    1.001

    0

    Now E [L(X N 1  )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore  R  = 2.726/3 = 0.909.

    H ff ’ C i

    http://find/

  • 8/18/2019 Willems is It 2013

    18/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Huffman’s Construction

    Example

    Let  N  = 3 and  θ = 0.3, then  h(0.3) = 0.881.

    .343

    .147

    .147

    .063

    .147

    .063

    .063

    .027

    .0901

    0

    .126

    1

    0.216

    1

    0

    .2941

    0   .3630

    1

    .6371

    0

    1.001

    0

    Now E [L(X N 

    1  )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore  R  = 2.726/3 = 0.909.

    H ff ’ C i

    http://find/

  • 8/18/2019 Willems is It 2013

    19/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Huffman’s Construction

    Example

    Let  N  = 3 and  θ = 0.3, then  h(0.3) = 0.881.

    .343

    .147

    .147

    .063

    .147

    .063

    .063

    .027

    .0901

    0

    .126

    1

    0.216

    1

    0

    .2941

    0   .3630

    1

    .6371

    0

    1.001

    0

    Now E [L(X N 

    1  )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore  R  = 2.726/3 = 0.909.

    H ff ’ C t ti

    http://find/

  • 8/18/2019 Willems is It 2013

    20/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Huffman’s Construction

    Example

    Let  N  = 3 and  θ = 0.3, then  h(0.3) = 0.881.

    .343

    .147

    .147

    .063

    .147

    .063

    .063

    .027

    .0901

    0

    .126

    1

    0.216

    1

    0

    .2941

    0   .3630

    1

    .6371

    0

    1.001

    0

    Now E [L(X N 

    1  )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore  R  = 2.726/3 = 0.909.

    H ff ’ C t ti

    http://find/http://goback/

  • 8/18/2019 Willems is It 2013

    21/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Huffman s Construction

    Example

    Let  N  = 3 and  θ = 0.3, then  h(0.3) = 0.881.

    .343

    .147

    .147

    .063

    .147

    .063

    .063

    .027

    .0901

    0

    .126

    1

    0.216

    1

    0

    .2941

    0   .3630

    1

    .6371

    0

    1.001

    0

    Now E [L(X N 

    1  )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore  R  = 2.726/3 = 0.909.

    Huffman’s Construction

    http://find/

  • 8/18/2019 Willems is It 2013

    22/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Huffman s Construction

    Example

    Let  N  = 3 and  θ = 0.3, then  h(0.3) = 0.881.

    .343

    .147

    .147

    .063

    .147

    .063

    .063

    .027

    .0901

    0

    .126

    1

    0.216

    1

    0

    .2941

    0   .3630

    1

    .6371

    0

    1.001

    0

    Now E [L(X N 

    1  )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore  R  = 2.726/3 = 0.909.

    Huffman’s Construction

    http://find/

  • 8/18/2019 Willems is It 2013

    23/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Huffman s Construction

    Example

    Let  N  = 3 and  θ = 0.3, then  h(0.3) = 0.881.

    .343

    .147

    .147

    .063

    .147

    .063

    .063

    .027

    .0901

    0

    .126

    1

    0.216

    1

    0

    .2941

    0   .3630

    1

    .6371

    0

    1.001

    0

    Now E [L(X N 

    1  )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore  R  = 2.726/3 = 0.909.

    Remarks: Huffman Code

    http://find/

  • 8/18/2019 Willems is It 2013

    24/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Remarks: Huffman Code

    Note that  R  ↓  h(θ) when  N  → ∞.Always  E [L(X N 1   )] ≥ 1. For  θ ≈ 0 a Huffman code has expectedcodeword length  E [L(X N 1   )] ≈ 1 and rate  R  ≈  1/N .Better bounds exist for Huffman codes than  E [L(X N 1   )] <  H (X N 1  ) + 1.E.g. Gallager [1978] showed that

    E [L(X N 1   )] − H (X N 1   ) ≤ maxx N 1

    P (x N 1  ) + 0.086.

    Adaptive Huffman Codes (Gallager [1978]).

    Variable-to-Fixed (VF) Length Codes

    http://find/

  • 8/18/2019 Willems is It 2013

    25/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Variable-to-Fixed (VF) Length Codes

    IDEA:Parse the source output into variable-length segments of roughly the sameprobability. Code all these segments with codewords of fixed length.

    Definition (VF-Length Code)

    A  VF-length code   is defined by a set of variable-length  source segments.Each segment  x ∗ in the set gets a unique binary codeword  c (x ∗) of lengthL. The length of a segment  x ∗ is denoted as  N (x ∗). The  rate  of aVF-code is

    R   ∆=

      L

    E [N (X ∗)]  (code-symbols/source symbol).

    GOAL:

    We would like to find parsable VF-length codes that MINIMIZE this rate.

    Proper-and-Complete Segment Sets

    http://find/http://goback/

  • 8/18/2019 Willems is It 2013

    26/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree SourcesContext Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Proper and Complete Segment Sets

    Definition (Proper-and-Complete Segment Sets)

    A set of source segments is  proper-and-complete  if each semi-infinitesource sequence has a unique prefix in this segment set.

    We focus on proper-and-complete segments sets. Segments in aproper-and-complete set can be regarded as   leaves in a rooted tree. Such

    sets guarantee   instantaneous parsability.

    Example

    x ∗ N (x ∗)   c (x ∗)1 1 11

    01 2 10001 3 01000 3 00

    0

    1

    00

    01

    000

    001

    Proper-and-Complete Segment Sets: Leaf-Node Lemma

    http://find/

  • 8/18/2019 Willems is It 2013

    27/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Proper and Complete Segment Sets: Leaf Node Lemma

    Assume that the source is IID with parameter  θ. Consider a set of segmentsand all their prefixes. Depict them in a tree. The segments are  leaves, theprefixes  nodes. Note that all the nodes and leaves have a probability. E.g.

    P (10) = θ(1 − θ). Let  F (·) be a function on nodes, leaves.

    F (∅)

    F (0)

    F (1)

    0

    1

    F (10)

    F (11)

    0

    1

    F (100)

    F (101)

    0

    1

    Lemma (Massey, 1983)

    l ∈leaves P (l )[F (l ) − F (∅)] =

    n∈nodes P (n)

    s ∈sons of  nP (s )

    P (n)[F (s ) − F (n)].

    Let  F (x ∗) = # of edges from  x ∗ to root, then

    E [N (X ∗)] =

    x ∗∈nodes

    P (x ∗).

    Let  F (x ∗) = − log2 P (x ∗), thenH (X ∗) = E [N (X ∗)]h(θ).

    Proper-and-Complete Segment Sets: Leaf-Node Lemma

    http://find/

  • 8/18/2019 Willems is It 2013

    28/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Proper and Complete Segment Sets: Leaf Node Lemma

    Assume that the source is IID with parameter  θ. Consider a set of segmentsand all their prefixes. Depict them in a tree. The segments are  leaves, theprefixes  nodes. Note that all the nodes and leaves have a probability. E.g.

    P (10) = θ(1 − θ). Let  F (·) be a function on nodes, leaves.

    F (∅)

    F (0)

    F (1)

    0

    1

    F (10)

    F (11)

    0

    1

    F (100)

    F (101)

    0

    1

    Lemma (Massey, 1983)

    l ∈leaves P (l )[F (l ) − F (∅)] =

    n∈nodes P (n)

    s ∈sons of  nP (s )

    P (n)[F (s ) − F (n)].

    Let  F (x ∗) = # of edges from  x ∗ to root, then

    E [N (X ∗)] =

    x ∗∈nodes

    P (x ∗).

    Let  F (x ∗) = − log2 P (x ∗), thenH (X ∗) = E [N (X ∗)]h(θ).

    Proper-and-Complete Segment Sets: Result

    http://find/

  • 8/18/2019 Willems is It 2013

    29/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    p p g

    Theorem

    For any proper-and-complete segment set with no more than  2L segments 

    L ≥ H (X ∗) = E [N (X ∗)]h(θ),

    or  R  =  L/E [N (X ∗)] ≥ h(θ).

    More precisely, since

    R  =  L

    E [N (X ∗)]  =

      L

    H (X ∗)h(θ),

    we should make  H (X ∗) as close as possible to  L, hence  all segmentsshould have roughly the same probability.

    Tunstall’s Code

    http://find/

  • 8/18/2019 Willems is It 2013

    30/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Consider 0 < θ ≤ 1/2.

    Definition (Optimal VF-length Code)

    A code that maximizes the expected segment-length  E [N (X ∗)] is calledoptimal. Such a code minimizes the rate  R .

    Theorem (Tunstall, 1967)

    The Tunstall construction leads to an optimal code.CONSTRUCTION:

    Start with the empty segment  ∅  which has unit probability.As long as the number of segments is smaller than  2L replace asegment  s  with largest probability  P (s )  by two segments  s 0  and  s 1.

    The probabilities of the new segments (leaves) are P (s 0) = P (s )(1 − θ)  and  P (s 1) = P (s )θ.

    The Tunstall construction results in  H (X ∗) ≥ L − log2(1/θ)  and therefore R  ≤   L

    L+log2(θ)h(θ)  (Jelinek and Schneider [1972]).

    Tunstall’s Code

    http://find/

  • 8/18/2019 Willems is It 2013

    31/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Consider 0 < θ ≤ 1/2.

    Definition (Optimal VF-length Code)

    A code that maximizes the expected segment-length  E [N (X ∗)] is calledoptimal. Such a code minimizes the rate  R .

    Theorem (Tunstall, 1967)

    The Tunstall construction leads to an optimal code.CONSTRUCTION:

    Start with the empty segment  ∅  which has unit probability.As long as the number of segments is smaller than  2L replace asegment  s  with largest probability  P (s )  by two segments  s 0  and  s 1.

    The probabilities of the new segments (leaves) are P (s 0) = P (s )(1 − θ)  and  P (s 1) = P (s )θ.

    The Tunstall construction results in  H (X ∗) ≥ L − log2(1/θ)  and therefore R  ≤   L

    L+log2(θ)h(θ)  (Jelinek and Schneider [1972]).

    Tunstall’s Construction

    http://find/

  • 8/18/2019 Willems is It 2013

    32/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example

    Let  L = 3 and  θ = 0.3. Again  h(0.3) = 0.881.

    1.00

    .700

    .300

    0

    1

    .490

    .210

    0

    1

    .343

    .147

    0

    1

    .240

    .103

    0

    1

    .210

    .090

    0

    1

    .168

    .072

    0

    1

    .147

    .063

    0

    1

    Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R  = 3/3.283 = 0.914.

    Tunstall’s Construction

    http://find/

  • 8/18/2019 Willems is It 2013

    33/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example

    Let  L = 3 and  θ = 0.3. Again  h(0.3) = 0.881.

    1.00

    .700

    .300

    0

    1

    .490

    .210

    0

    1

    .343

    .147

    0

    1

    .240

    .103

    0

    1

    .210

    .090

    0

    1

    .168

    .072

    0

    1

    .147

    .063

    0

    1

    Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R  = 3/3.283 = 0.914.

    Tunstall’s Construction

    http://find/

  • 8/18/2019 Willems is It 2013

    34/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example

    Let  L = 3 and  θ = 0.3. Again  h(0.3) = 0.881.

    1.00

    .700

    .300

    0

    1

    .490

    .210

    0

    1

    .343

    .147

    0

    1

    .240

    .103

    0

    1

    .210

    .090

    0

    1

    .168

    .072

    0

    1

    .147

    .063

    0

    1

    Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R  = 3/3.283 = 0.914.

    Tunstall’s Construction

    http://find/

  • 8/18/2019 Willems is It 2013

    35/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example

    Let  L = 3 and  θ = 0.3. Again  h(0.3) = 0.881.

    1.00

    .700

    .300

    0

    1

    .490

    .210

    0

    1

    .343

    .147

    0

    1

    .240

    .103

    0

    1

    .210

    .090

    0

    1

    .168

    .072

    0

    1

    .147

    .063

    0

    1

    Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R  = 3/3.283 = 0.914.

    Tunstall’s Construction

    http://find/

  • 8/18/2019 Willems is It 2013

    36/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example

    Let  L = 3 and  θ = 0.3. Again  h(0.3) = 0.881.

    1.00

    .700

    .300

    0

    1

    .490

    .210

    0

    1

    .343

    .147

    0

    1

    .240

    .103

    0

    1

    .210

    .090

    0

    1

    .168

    .072

    0

    1

    .147

    .063

    0

    1

    Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R  = 3/3.283 = 0.914.

    Tunstall’s Construction

    http://find/

  • 8/18/2019 Willems is It 2013

    37/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example

    Let  L = 3 and  θ = 0.3. Again  h(0.3) = 0.881.

    1.00

    .700

    .300

    0

    1

    .490

    .210

    0

    1

    .343

    .147

    0

    1

    .240

    .103

    0

    1

    .210

    .090

    0

    1

    .168

    .072

    0

    1

    .147

    .063

    0

    1

    Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R  = 3/3.283 = 0.914.

    Tunstall’s Construction

    http://find/

  • 8/18/2019 Willems is It 2013

    38/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example

    Let  L = 3 and  θ = 0.3. Again  h(0.3) = 0.881.

    1.00

    .700

    .300

    0

    1

    .490

    .210

    0

    1

    .343

    .147

    0

    1

    .240

    .103

    0

    1

    .210

    .090

    0

    1

    .168

    .072

    0

    1

    .147

    .063

    0

    1

    Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R  = 3/3.283 = 0.914.

    Tunstall’s Construction

    http://find/

  • 8/18/2019 Willems is It 2013

    39/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example

    Let  L = 3 and  θ = 0.3. Again  h(0.3) = 0.881.

    1.00

    .700

    .300

    0

    1

    .490

    .210

    0

    1

    .343

    .147

    0

    1

    .240

    .103

    0

    1

    .210

    .090

    0

    1

    .168

    .072

    0

    1

    .147

    .063

    0

    1

    Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R  = 3/3.283 = 0.914.

    Tunstall’s Construction

    http://find/

  • 8/18/2019 Willems is It 2013

    40/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example

    Let  L = 3 and  θ = 0.3. Again  h(0.3) = 0.881.

    1.00

    .700

    .300

    0

    1

    .490

    .210

    0

    1

    .343

    .147

    0

    1

    .240

    .103

    0

    1

    .210

    .090

    0

    1

    .168

    .072

    0

    1

    .147

    .063

    0

    1

    Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R  = 3/3.283 = 0.914.

    Remarks: Tunstall Code

    http://find/

  • 8/18/2019 Willems is It 2013

    41/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Note that  R  ↓  h(θ) when  L → ∞.For  θ ≈ 0 a Tunstall code has expected segment lengthE [N (X ∗)] ≈ 2L − 1 and rate  R  ≈  L/(2L − 1). Better than Huffmanfor L =  N .

    In each step in the Tunstall procedure, a leaf with the largestprobability is changed into a node. This leads to:

    The largest increase in expected segment length (Massey LN-lemma),and  P (n) ≥ P (l ) for all nodes  n  and leaves   l .Therefore for any two leafs   l   and   l  we can say that

    P (l ) ≥ θP (n) ≥ P (l ).

    So leaves cannot differ too much in probability. This fact is used to lowerbound  H (X ∗) ≥ L− log2(1/θ).

    Optimal VF-length codes can also be found by fixing a number  γ   and

    defining a node to be internal if its probability is ≥ γ   (Khodak, 1969).The size of the segment set is not completely controllable now.

    Run-length Codes (Golomb [1966]).

    Outline

    http://find/

  • 8/18/2019 Willems is It 2013

    42/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    1   INTRODUCTION

    2   HUFFMAN and TUNSTALLBinary IID Sources

    Huffman CodeTunstall Code

    3   ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code

    4

      ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy

    5   CONTEXT-TREE WEIGHTINGIID, unknown  θBinary Tree-SourcesContext Trees

    Coding Probabilities6   REPETITION TIMES

    LZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy

    7   CONCLUSION

    Lexicographical Ordering

    http://find/

  • 8/18/2019 Willems is It 2013

    43/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    IDEA:

    Sequences having the same weight (and probability) only need to be

    INDEXED. The binary representation of the index can be taken ascodeword.

    Definition (Lexicographical Ordering, Index)

    In a  lexicographical ordering (0 <  1)  we say that  x N 1  

  • 8/18/2019 Willems is It 2013

    44/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Theorem (Cover, 1973)

    From the sequence  x N 1   ∈ S  we can compute index 

    i S(x N 1  ) =

    n=1,N :x n =1

    #S(x 1, x 2, · · ·   , x n−1, 0),

    where  #S(x 1, x 2,

    · · · , x k )  denotes the number of sequences in

     S having prefix  x 1, · · ·  , x k .Moreover from the index   i S(x 

    N 1  )   the sequence  x 

    N 1   can be computed if 

    numbers  #S(x 1, x 2, · · ·  , x n−1, 0)  for  n = 1, N   are available.The index of a sequence can be represented by a codeword of fixed lengthlog2 |S|.

    Example

    Index   i S(10100) = #S(0 )+#S(100) =4

    2

    +2

    1

    = 6 + 2 = 8 hence, since

    |S| = 10 the corresponding codeword is 1000.

    http://find/

  • 8/18/2019 Willems is It 2013

    45/102

    FV: Pascal-Triangle Method (cont.)

  • 8/18/2019 Willems is It 2013

    46/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry CodeARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    First note that

    H (X N 1   ) =  H (X N 1   , w (X N 1   )) =  H (W ) +  H (X N 1 |W ).If we use  enumerative coding  for X N 1   given weight  w , since all sequenceswith a fixed weight have equal probability

    E [L(X N 1 |W )] = w =0,1,N 

    P (w )log2N 

    w <

    w =0,1,N 

    P (w )log2

    N w 

    + 1 = H (X N 1 |W ) + 1.

    If  W   is encoded using a  Huffman code  we obtain

    E [L(X N 1

      )] =   E [L(W )] + E [L(X N 1 |

    W )]

    ≤   H (W ) + 1 + H (X N 1 |W ) + 1=   H (X N 1   ) + 2.

    Worse than Huffman, but no big code-table needed however.

    http://find/

  • 8/18/2019 Willems is It 2013

    47/102

    VF: Petry Code

  • 8/18/2019 Willems is It 2013

    48/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry CodeARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    IDEA:

    Modify the Tunstall segment sets such that the segments can be indexed.

    Again let 0 < θ ≤ 1/2. It can be shown that a proper-and-completesegment set is a Tunstall set (maximal  E [N (X ∗)] given the number of segments) if and only if for all nodes  n  and all leaves   l 

    P (n) ≥ P (l ).

    Consequence

    If the segments  x ∗ in a proper-and-complete segment set satisfy 

    P (x ∗−1) > γ  ≥  P (x ∗),

    where  x ∗−1 is  x ∗ without the last symbol, this segment set is a Tunstall set. Constant  γ  determines the size of the set.

    SinceP (x ∗) = (1 − θ)n0(x ∗)θn1 (x ∗),

    where  n0(x ∗) is the number or zeros in  x ∗ and  n1(x ∗) the number of onesin  x ∗, etc., this is equivalent to

    An0(x ∗−1) + Bn1(x 

    ∗−1) <  C  ≤  An0(x ∗) + Bn1(x ∗)for A = − logb (1 − θ),  B  = − logb  θ,  C   = − logb  γ ,  and some log-base  b .

    VF: Petry Code (cont.)

    http://goback/http://find/http://goback/

  • 8/18/2019 Willems is It 2013

    49/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry CodeARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Note that log-base  b  has to satisfy

    1 = (1 − θ) + θ  = b −A + b −B .

    For special values of  θ,  A  and  B  are integers.   E.g. for  θ = (1 − θ)2 weobtain that  A = 1 and  B  = 2 for  b  = (1 +

    √ 5)/2. Now  C  can also

    assumed to be integer. The corresponding codes are called Petry codes.

    Definition (Petry (Schalkwijk), 1982)

    Fix integers  A  and  B . The segments  x ∗ in a proper-and-complete Petrysegment set satisfy

    An0(x ∗−1) +  Bn1(x 

    ∗−1) <  C  ≤  An0(x ∗) +  Bn1(x ∗).

    Integer  C  can be chosen to control the size of the set.

    Linear Array

    Petry codes can be implemented using a linear array.

    http://find/http://goback/

  • 8/18/2019 Willems is It 2013

    50/102

    VF: Petry Code (cont.)

  • 8/18/2019 Willems is It 2013

    51/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry CodeARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Theorem (Tjalkens & W. (1987))

    A Petry code with parameters  A <  B ,  and  C   is a Tunstall code for parameter  q  where  q  = b −B  when  b   is the solution of  b −A + b −B  = 1.For arbitrary  θ  the rate 

    log2 σ(C )

    E [N (X ∗

    )] ≤  C  + (B  − 1)

    (h(θ) +  d (θ

    ||q )).

    Example

    In the table  q  for several values of  A  and  B :

    A B    2 3 4 5

    1 0.382 0.318 0.276 0.2452 0.4302 0.382 0.3463 0.450 0.412

    Remarks: VF Petry Code

    http://find/

  • 8/18/2019 Willems is It 2013

    52/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry CodeARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Note that log2 σ(C )/E [N (X ∗)] ↓ h(θ) +  d (θ|q ) when  C  → ∞, hence

    a Petry code achieves entropy for  θ = q .

    Tjalkens and W. investigated VF-length Petry codes for Markovsources, again with a linear array for each state.

    VF-length universal enumerative solutions exist (Lawrence [1977],Tjalkens and W. [1992]).

    The numbers in the linear array show  exponential behaviour. Also anarray 2−i /M f    for  i  = 1, M  can be used, through which we makesteps  (Tjalkens [PhD, 1987]). This reduces the storage complexityand is similar to Rissanen [1976] multiplication-avoiding arithmeticcoding (Generalized Kraft inequality).

    Outline

    http://goforward/http://find/http://goback/

  • 8/18/2019 Willems is It 2013

    53/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry CodeARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    1   INTRODUCTION

    2   HUFFMAN and TUNSTALLBinary IID Sources

    Huffman CodeTunstall Code

    3   ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code

    4   ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy

    5   CONTEXT-TREE WEIGHTINGIID, unknown  θBinary Tree-SourcesContext Trees

    Coding Probabilities6   REPETITION TIMES

    LZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy

    7   CONCLUSION

    Idea Elias

    http://find/

  • 8/18/2019 Willems is It 2013

    54/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry CodeARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Elias:

    If source sequences are ORDERED LEXICOGRAPHICALLY then

    codewords can be COMPUTED SEQUENTIALLY from the sourcesequence using conditional PROBABILITIES of next symbol given theprevious ones, and vice versa.

    Source Intervals

    http://find/

  • 8/18/2019 Willems is It 2013

    55/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Definition

    Order the source sequences  x N 1   ∈ {

    0, 1}

    N  lexicographically  according to0 <  1.Now, to each source sequence  x N 1   ∈ {0, 1}N  there corresponds asource-interval

    I (x N 1  ) = [Q (x N 1  ), Q (x 

    N 1  ) +  P (x 

    N 1  ))

    withQ (x N 1  ) =

    x̃ N 1  

  • 8/18/2019 Willems is It 2013

    56/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    A codeword  c  with length  L  can be regarded as a  binary fraction   .c .If we  concatenate this codeword with others   the corresponding fractioncan increase, but no more than 2−L.

    Definition

    To a codeword  c (x N 1  ) with length  L(x N 1  ) there corresponds a  code interval

    J (x N 1  ) = [.c (x N 1  ), .c (x 

    N 1  ) + 2

    −L(x N 1   )).

    Note that  J (x N 1  ) ⊂ [0, 1).

    LOSSLESS SOURCE

    Arithmetic coding: Encoding and Decoding

    http://find/

  • 8/18/2019 Willems is It 2013

    57/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Procedure

    ENCODING: Choose  c  such that the code interval ⊆  source interval,i.e.

    [.c , .c  + 2−L) ⊆ [Q (x N 1  ), Q (x N 1  ) + P (x N 1  )).DECODING: Is possible since there is only one source interval thatcontains the code interval.

    Theorem

    For sequence  x N 1   with source-interval   I (x N 1  ) = [Q (x N 1  ), Q (x N 1  ) +  P (x N 1  ))take  c (x N 1  )  as the codeword with

    L(x N 1  )  ∆

    =   log21

    P (x N 1  ) + 1

    .c (x N 1  )  ∆

    =

      Q (x N 1  )

    ·2L(x 

    N 1   )

    ·2−L(x 

    N 1   ).

    ThenJ (c (x N 1  )) ⊆ I (x N 1  ).

    and 

    L(x N 1  ) <  log21

    P (x N 1  )+ 2,

    i.e. less than two bits above the ideal cod e wo rd   l eng th.

    LOSSLESS SOURCE

    http://find/

  • 8/18/2019 Willems is It 2013

    58/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example

    I.I.D. source with  θ = 0.2 and  N  = 2.

    Source Intervals

    11

    10

    01

    00

    0.00

    0.64

    0.80

    0.961.00 111110

    1101

    1011

    00

    0

    1/4

    11/163/4

    13/16

    7/8

    31/3263/64

    Code Intervals

    Source-intervals are disjoint ⇒  code-intervals are disjoint ⇒  prefixcondition holds.

    LOSSLESS SOURCE

    http://find/

  • 8/18/2019 Willems is It 2013

    59/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTIONHUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example

    I.I.D. source with  θ = 0.2 and  N  = 2.

    Source Intervals

    11

    10

    01

    00

    0.00

    0.64

    0.80

    0.961.00 111110

    1101

    1011

    00

    0

    1/4

    11/163/4

    13/16

    7/8

    31/3263/64

    Code Intervals

    Source-intervals are disjoint ⇒  code-intervals are disjoint ⇒  prefixcondition holds.

    LOSSLESS SOURCE

    Arithmetic Coding: Sequential Computation (Elias)

    http://find/

  • 8/18/2019 Willems is It 2013

    60/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTIONHUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Example (Connection to Cover’s formula)

    Let  L = 3 and  θ = 0.2.

    0

    1

    00

    01

    10

    11

    000

    001

    010

    011

    100

    101

    110

    111

    Q (101) =   P (0) + P (100) = 0.8 + 0.2 · 0.8 · 0.8 = 0.928.P (101) =   P (1)P (0)P (1) = 0.2

    ·0.8

    ·0.2 = 0.032.

    LOSSLESS SOURCE

    Arithmetic Coding: Sequential Computation (Elias)

    http://find/

  • 8/18/2019 Willems is It 2013

    61/102

    LOSSLESS SOURCECODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTIONHUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    In general

    Q (x N 1  ) =

    n=1,N :x n =1

    P (x 1, x 2, · · ·   , x n−1, 0),

    P (x N 1  ) =N 

    n=1

    P (x n

    |x 1, x 2,

    · · · , x n−1).

    Sequential Computation

    If we have access to  P (x 1, x 2, · · ·  , x n, 0) and  P (x 1, x 2, · · ·   , x n, 1) afterhaving processed  P (x 1, x 2, · · ·  , x n) for  n = 1, 2, · · ·   , N  we can computeI 

    (x N 

    1  ) sequentially.

    LOSSLESS SOURCE

    Universal Coding

    http://find/

  • 8/18/2019 Willems is It 2013

    62/102

    OSS SS SOU CCODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Coding Probabilities

    If the   actual probabilities  P (x N 1  ) are  not known  arithmetic coding is still

    possible if instead of  P (x N 

    1  ) we use  coding probabilities  P c (x N 

    1  ) satisfying

    P c (x N 1  )   >   0 for all x 

    N 1   , and

    x N 1

    P c (x N 1  ) = 1.

    ThenL(x N 1  ) <  log2

    1

    P c (x N 1  )+ 2.

    Encoder Decoderx N 1   c (x 

    N 1  )   x 

    N 1

    P c (x 1 · · · x n−1, 0), P c (x 1 · · · x n−1, 1)for n = 1, N 

    P c (x 1 · · · x n−1, 0), P c (x 1 · · · x n−1, 1)for n = 1, N 

    PROBLEM:  How do we choose the coding probabilities  P c (x N 1  )?

    LOSSLESS SOURCE

    Individual Redundancy

    http://find/

  • 8/18/2019 Willems is It 2013

    63/102

    CODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Definition

    The   individual redundancy  ρ(x N 

    1  ) of a sequence  x N 

    1   is defined as

    ρ(x N 1  ) = L(x N 1  ) − log2

    1

    P (x N 1  ),

    i.e.   codeword-length minus ideal codeword-length.

    Bound Individual Redundancy

    Arithmetic coding based on coding probabilities {P c (x N 1  ), x N 1   ∈ {0, 1}N }yields

    ρ(x N 1  ) <  log21

    P c (x N 

    1

     )+ 2 − log2

    1

    P (x N 

    1

     )= log2

    P (x N 1  )

    P c (x N 

    1

     )+ 2.

    We say that the  CODING redundancy   <  2 bits.The coding probabilities should be as large as possible (as close as possibleto the actual probabilities). Next focus on  remaining part of the individual

    redundancy  log2P (x N 1   )

    P c (x N 1   )

    .

    LOSSLESS SOURCE

    Remarks: Arithmetic Coding

    http://find/

  • 8/18/2019 Willems is It 2013

    64/102

    CODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Shannon [1948] already described relation between codewords andintervals, ordered probabilities however. Called Shannon-Fano code.

    Shannon-Fano-Elias, arbitrary ordering, but not sequential.Finite precision issues arithmetic coding solved by Pasco [1976] andRissanen [1976].

    LOSSLESS SOURCE

    Outline

    1   INTRODUCTION

    http://find/

  • 8/18/2019 Willems is It 2013

    65/102

    CODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    2   HUFFMAN and TUNSTALLBinary IID Sources

    Huffman CodeTunstall Code

    3   ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code

    4   ARITHMETIC CODING

    IntervalsUniversal Coding, Individual Redundancy

    5   CONTEXT-TREE WEIGHTINGIID, unknown  θBinary Tree-SourcesContext Trees

    Coding Probabilities6   REPETITION TIMES

    LZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy

    7   CONCLUSION

    LOSSLESS SOURCECODING ALGORITHMS

    CTW: Universal Codes

    http://find/

  • 8/18/2019 Willems is It 2013

    66/102

    CODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    IDEA:

    Find good coding probabilities for sources with UNKNOWNPARAMETERS and STRUCTURE. Use WEIGHTING!

    LOSSLESS SOURCECODING ALGORITHMS

    Coding for a Binary IID Source, Unknown  θ

    Definition (Krichevsky Trofimov estimator (1981))

    http://find/

  • 8/18/2019 Willems is It 2013

    67/102

    CODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Definition (Krichevsky-Trofimov estimator (1981))

    A good coding probability  P c (x N 1  ) for a sequence  x N 1   that contains  a  zeroes

    and b 

     = N 

     −a

     ones is

    P e (a, b ) =

       1θ=0

    1

    π 

    (1 − θ)θ· (1 − θ)aθb d θ.

    (Dirichlet-(1/2, 1/2) prior, “weighting”).

    Theorem

    Upperbound on the PARAMETER redundancy 

    log2P (x N 1  )

    P c (x N 1  )= log2

    θa(1 − θ)b P e (a, b )

      ≤   12

     log2(a + b ) + 1 =  1

    2 log2(N ) + 1.

    for all  θ  and  x N 1   with  a  zeros and  b  ones.Probability of a sequence with  a  zeroes and  b  ones followed by a zero 

    P e (a + 1, b ) =  a + 1/2

    a + b  + 1 · P e (a, b ),

    hence SEQUENTIAL COMPUTATION is possible! 

    LOSSLESS SOURCECODING ALGORITHMS

    Individual Redundancy Binary IID source

    http://find/

  • 8/18/2019 Willems is It 2013

    68/102

    CODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context TreesCoding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    The total individual redundancy

    ρ(x N 1  ) <  log2θa(1 − θ)b 

    P e (a, b )  + 2 ≤

    1

    2 log2(N ) + 1

    + 2.

    for all  θ  and  x N 1   with  a  zeroes and  b  ones.

    Shtarkov [1988]:   12

     log2 N  behaviour is asymptotically optimal forindividual redundancy for  N  → ∞ (NML-estimator)!Rissanen [1984]: Also for expected redundancy   1

    2 log2 N  behaviour is

    asymptotically optimal.

    LOSSLESS SOURCECODING ALGORITHMS

    CTW: Binary Tree-Sources

    Definition

    http://find/

  • 8/18/2019 Willems is It 2013

    69/102

    CODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Definition

    0

    1

    00

    10

    (tree-) model M = {00, 10, 1}

    · · ·   x n−2   x n−1   x n

    θ1  = 0.1

    θ10  = 0.3

    θ00  = 0.5

    parameters

    P (X n  = 1| · · ·   , X n−1  = 1) = 0.1P (X n  = 1| · · ·   , X n−2  = 1, X n−1  = 0) = 0.3P (X n  = 1

    | · · ·  , X n−2  = 0, X n−1  = 0) = 0.5

    LOSSLESS SOURCECODING ALGORITHMS

    CTW: Binary Tree-Sources

    Definition

    http://find/

  • 8/18/2019 Willems is It 2013

    70/102

    CODING ALGORITHMS

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    0

    1

    00

    10

    (tree-) model M = {00, 10, 1}

    · · ·   x n−2   x n−1   x n

    θ1  = 0.1

    θ10  = 0.3

    θ00  = 0.5

    parameters

    P (X n  = 1| · · ·   , X n−1  = 1) = 0.1P (X n  = 1| · · ·   , X n−2  = 1, X n−1  = 0) = 0.3P (X n  = 1

    | · · ·  , X n−2  = 0, X n−1  = 0) = 0.5

    LOSSLESS SOURCECODING ALGORITHMS

    CTW: Binary Tree-Sources

    Definition

    http://find/

  • 8/18/2019 Willems is It 2013

    71/102

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    0

    1

    00

    10

    (tree-) model M = {00, 10, 1}

    · · ·   x n−2   x n−1   x n

    θ1  = 0.1

    θ10  = 0.3

    θ00  = 0.5

    parameters

    P (X n  = 1| · · ·   , X n−1  = 1) = 0.1P (X n  = 1| · · ·   , X n−2  = 1, X n−1  = 0) = 0.3P (X n  = 1

    | · · ·  , X n−2  = 0, X n−1  = 0) = 0.5

    LOSSLESS SOURCECODING ALGORITHMS

    CTW: Problem, Concepts

    http://find/

  • 8/18/2019 Willems is It 2013

    72/102

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    PROBLEM:What are good coding probabilities for sequences  x N 1   produced by atree-source with

    an unknown tree-model,

    and unknown parameters?

    CONCEPTS:

    CONTEXT TREE (Rissanen [1983])

    WEIGHTING: If  P 1(x ) or P 2(x ) are two alternative codingprobabilities for sequence  x , then the  weighted probability

    P w (x )  ∆=

      P 1(x ) +  P 2(x )

    2  ≥   1

    2 max(P 1(x ), P 2(x )),

    thus we loose at most a factor of 2, which is one bit in redundancy.

    LOSSLESS SOURCECODING ALGORITHMS

    CTW: Problem, Concepts

    http://find/

  • 8/18/2019 Willems is It 2013

    73/102

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    PROBLEM:What are good coding probabilities for sequences  x N 1   produced by atree-source with

    an unknown tree-model,

    and unknown parameters?

    CONCEPTS:

    CONTEXT TREE (Rissanen [1983])

    WEIGHTING: If  P 1(x ) or P 2(x ) are two alternative codingprobabilities for sequence  x , then the  weighted probability

    P w (x )  ∆=

      P 1(x ) +  P 2(x )

    2  ≥   1

    2 max(P 1(x ), P 2(x )),

    thus we loose at most a factor of 2, which is one bit in redundancy.

    LOSSLESS SOURCECODING ALGORITHMS

    Context Trees

    D fi i i (C T )

    http://find/

  • 8/18/2019 Willems is It 2013

    74/102

    Frans M.J. Willems

    INTRODUCTION

    HUFFMAN-TUNSTALL

    Binary IID Sources

    Huffman Code

    Tunstall Code

    ENUMERATIVE CODING

    Lexicographical Ordering

    FV: Pascal-∆ Method

    VF: Petry Code

    ARITHMETIC CODING

    Intervals

    Universal Coding,Individual Redundancy

    CONTEXT-TREEWEIGHTING

    IID, unknown  θ

    Tree Sources

    Context Trees

    Coding Prbs., Redundancy

    REPETITION TIMES

    LZ77

    Repetition Times, Kac

    Repetition-Time Algorithm

    Achieving Entropy

    CONCLUSION

    Definition (Context Tree)

    0

    1

    00

    10

    01

    11

    000

    100

    010

    110

    001

    101

    011

    111

    Node  s  contains the sequence of source symbols that have occurredfollowing context  s . Depth is  D .

    LOSSLESS SOURCECODING ALGORITHMS

    Context-tree splits up sequences in subsequences

    Example