05A Compression

download 05A Compression

of 102

Transcript of 05A Compression

  • 8/6/2019 05A Compression

    1/102

    05A-compression.fm 1 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ech

    nik

    .tu-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser Multimedia-Systems:

    Compression

    Prof. Dr.-Ing. Ralf SteinmetzProf. Dr. Max Mhlhuser

    MM: TU Darmstadt - Darmstadt University of Technology,

    Dept. of of Computer Science

    TK - Telecooperation, Tel.+49 6151 16-3709,

    Alexanderstr. 6, D-64283 Darmstadt, Germany, [email protected] Fax. +49 6151 16-3052

    RS:TU Darmstadt - Darmstadt University of Technology,

    Dept. of Electrical Engineering and Information Technology, Dept. of Computer Science

    KOM - Industrial Process and System Communications, Tel.+49 6151 166151,

    Merckstr. 25, D-64283 Darmstadt, Germany, [email protected] Fax. +49 6151 166152

    GMD -German National Research Center for Information Technologyhttc - Hessian Telemedia Technology Competence-Center e.V

    http://goback/
  • 8/6/2019 05A Compression

    2/102

    05A-compression.fm 2 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ech

    nik

    .tu-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Scope

    Usage Applications

    Learning & Teaching Design User Interfaces

    Services Content

    Process-

    ing

    Docu-

    mentsSecurity ...

    Synchro-

    nization

    GroupCommuni-

    cations

    System

    s Databases Programming

    Media-Server Operating Systems Communications

    Opt. Memories Quality of Service Networks

    Basics Computer

    Archi-tectures

    Compression

    Image &

    GraphicsAnimation Video Audio

    http://goback/
  • 8/6/2019 05A Compression

    3/102

    05A-compression.fm 3 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ech

    nik

    .tu-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Contents

    1. Motivation2. Requirements - General

    3. Fundamentals - Categories

    4. Source Coding

    5. Entropy Coding:

    6. Hybrid Coding: Basic Encoding Steps

    7. JPEG

    8. H.261 and related ITU Standards

    9. MPEG-1

    10. MPEG-2

    11. MPEG-4

    12. Wavelets13. Fractal Image Compression

    14. Basic Audio and Speech Coding Schemes

    15. Conclusion

    http://goback/
  • 8/6/2019 05A Compression

    4/102

    05A-compression.fm 4 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ech

    nik

    .tu-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    1. Motivation

    Digital video in computing means for

    Text:

    1 page with 80 char/line and 64 lines/page and 2 Byte/Char

    80 x 64 x 2 x 8 = 80 kBit/page

    Image:

    24 Bit/Pixel, 512 x 512 Pixel/image 512 x 512 x 24 = 6 MBit/Image

    Audio:

    CD-quality, samplerate44,1 kHz, 16 Bit/sample

    Mono: 44,1 x 16 = 706 kBit/s

    Stereo: 1.412 MBit/s Video:

    full frames with 1024 x 1024 Pixel/frame, 24 Bit/Pixel, 30 frames/s1024 x 1024 x 24 x 30 = 720 MBit/s

    more realistic360 x 240 Pixel/frame = 60 MBit/s

    Hence compression is NECESSARY

    http://goback/
  • 8/6/2019 05A Compression

    5/102

    05A-compression.fm 5 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ech

    nik

    .tu-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    2. Requirements - General

    high quality

    compression

    low delay

    low complexity (e.g., ease of decoding)efficient implementation (e.g., memory req.)

    intrinsic scalability

    http://goback/
  • 8/6/2019 05A Compression

    6/102

    05A-compression.fm 6 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ech

    nik

    .tu-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Requirements

    DIALOGUE AND RETRIEVAL mode requirements:

    Independence of frame size and video frame rate

    Synchronization of audio, video, and other media

    DIALOGUE mode requirements:

    Compression and decompression in real-time(e.g. 25 frames/s)

    End-to-end delay < 150ms

    RETRIEVAL mode requirements:

    Fast forward and backward data retrieval

    Random access within 1/2 s

    Software and/or hardware-assisted implementation requirements

    http://goback/
  • 8/6/2019 05A Compression

    7/102

    05A-compression.fm 7 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ech

    nik

    .tu-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    3. Fundamentals - Categories

    entropy encodingsource coding

    - based on semantic of the data

    - often lossy

    channel coding

    - adaptation to communication channel

    - introduction of redundancy

    hybrid

    coding

    - entropy

    and

    source

    coding

    entropy coding

    - ignoring semantics of the data

    - lossless

    http://goback/
  • 8/6/2019 05A Compression

    8/102

    05A-compression.fm 8 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ech

    nik

    .tu-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Categories and Techniques

    EntropyCoding

    Run-Length CodingHuffman Coding

    Arithmetic Coding

    Source

    Coding

    Prediction DPCM

    DM

    TransformationFFT

    DCT

    Layered Coding

    Bit Position

    Subsampling

    Sub-Band Coding

    Vector Quantization

    Hybrid

    Coding

    JPEG

    MPEG

    H.261, H.263

    proprietary: Quicktime, ...

    http://goback/
  • 8/6/2019 05A Compression

    9/102

    05A-compression.fm 9 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    ech

    nik

    .tu-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Categories & Techniques, Cont. (1)

    Two principal possibilities

    1. Entropy Coding: Eliminate Redundancy (thus, lossless)

    2. Reduction Coding: Eliminate Irrelevance / Low-Relevance (lossy)

    Preparatory Step: Decorrelation - Eliminate Interdependencies this is the essence of source coding

    changes "representation" of media

    goal usually: reduce dependencies between data

    as such, is a preparatory step!!

    usually, does not compress

    Steps in hybrid coding (often): decorrelation - reduction - entropy coding

    often: reduction by quantization

    last step: additional compresion without harm

    note: literature usually uses terms as in last slide!!

    note: reduction coding is "smart deletion", not really "compression"

    http://goback/
  • 8/6/2019 05A Compression

    10/102

    05A-compression.fm 10 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    echn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Categories and Techniques, Cont. (2)

    Major distinction: Symmetric / Asymmetric

    Asym. (usually): more effort for compression

    o.k. if compression non real-time, "only once" (movie!)

    may involve number-crunchers (...owned by content provider)

    Symmetric: "required" for real-time, e.g., videoconferencing

    in reality, often not 100% symmetricFurther considerations include, e.g.,

    Adjustable compression rate? ...quality?

    "smooth" bit stream ("isochronous")?

    terms: CBR (const. bit rate) vs. VBR (variable bit rate)

    may be "over time": e.g., packet size BigSmallSmall BigSmallSmall...

    may be simulated w/ loop-back filter plus buffer

    "progressive" (mainly: non-continuous media): display-while-download

    "streaming": ~ same for video (here, rather an issue of software)

    more subtle issues "open" standard?

    good "performance" (ratio, speed) for all kinds of media?

    bullet-proof, well-understood?

    ...

    http://goback/
  • 8/6/2019 05A Compression

    11/102

    05A-compression.fm 11 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    echn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    4. Source Coding

    DPCM

    DPCM = Differential Pulse-Code Modulation

    Assumptions:

    Consecutive samples or frames have similar values

    Prediction is possible due to existing correlationFundamental Steps:

    Incoming sample or frame (pixel or block) is predictedby means of previously processed data

    Difference between incoming data and prediction is determined

    Difference is quantized

    Challenge: optimal predictor

    Further predictive coding technique:

    Delta modulation (DM): 1 bit as difference signal

    http://goback/
  • 8/6/2019 05A Compression

    12/102

    05A-compression.fm 12 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    echn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Source Coding: Transformation

    Assumptions:

    Data in the transformed domain is easier to compress

    Related processing is feasible

    Example:

    FFT: Fast Fourier Transformation

    DCT: Discrete Cosine Transformation

    Inverse

    Fourier Transformation

    time domain frequencydomain

    Fourier Transformation

    http://goback/
  • 8/6/2019 05A Compression

    13/102

    05A-compression.fm 13 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    echn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Source Coding: Sub-Band

    Assumption:

    Some frequency ranges are more important than others

    Example:

    Application:

    vocoder for speech communication

    MPEG audio

    frequency spectrum of the signal

    transformation / codingfrequency

    http://goback/
  • 8/6/2019 05A Compression

    14/102

    05A-compression.fm 14 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    echn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Entropy Coding: Principle

    Entropy (in information theory): information content/ "density"

    symbols/words equally likely: high entropy (full of information)

    otherwise: lower entropy (suboptimal representation of info, less dense)

    Entropy formula:

    example: given 4 possible symbols (words) in source code

    i) IF all equal p=1/4: H(P)=2; ii) IF p= 1/2, 1/4, 1/8, 1/8 --> H(P)= 1

    6

    /8 "Entropy coding" means:

    mean length of file equals (~almost) entropy

    in ii) above, with B=2 (binary):

    p=! code length -log2 () = -(-1)=1; p=! 2bits, etc.

    GOAL: find code w/ symbol length as close as possible to logB p()

    grey levelsp

    roba

    bility

    grey levels

    pro

    ba

    bility

    high

    Entropy

    low

    Entropy

    grey levelsp

    roba

    bility

    grey levels

    pro

    ba

    bility

    high

    Entropy

    low

    Entropy

    P( p p B

    H ) ( ) ( )log=P( p p B

    H ) ( ) ( )log=

    note: seems "little information" tous since it is very regular; this is not

    covered by entropy formula, yet maybe used for compression (e.g. run length)

    here: "little info" because"most of picture is in same gray"

    http://goback/
  • 8/6/2019 05A Compression

    15/102

    05A-compression.fm 15 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    echn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    5. Entropy Coding:

    Run-Length (only marginal relation to entropy)

    Assumption:

    Long sequences of identical symbols

    Example:

    Special variant: zero-length encoding

    only repetition of zeroes count

    in red part above, "symbol" not needed (i.e. "pays" for >2 repetitions)

    ... A B C E E E E E E D A C B...

    compression

    ... A B C E ! 6 D A C B...

    symbol

    special flag

    number ofoccurrences

    http://goback/
  • 8/6/2019 05A Compression

    16/102

    05A-compression.fm 16 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    echn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Entropy Coding: Huffman

    Basics:

    Assumption: some symbols occur more often than others

    E.g., character frequencies of the English language

    Idea: frequent symbols --> shorter bit strings (cf. Entropy!)

    Example: Characters to be encoded: A, B, C, D, E

    probability to occur: p(A)=0.3, p(B)=0.3, p(C)=0.1, p(D)=0.15, p(E)=0.15

    probability symbol code

    1

    0

    1

    0

    1

    0

    1

    0

    coding tree

    30%

    30%

    10%

    15%

    15%

    A

    B

    C

    D

    E

    11

    10

    011

    010

    00

    100%

    40%

    60%

    25%

    step 1: scan all leaves, assign(1,0) to the two with lowest

    probability -> intermediate root

    steps 2-n: scan current "tops"

    (intermediate roots or leaves),

    assign (1,0) to the two withlowest probability, -> ...

    end: assign codes by descending

    tree until leaves, bits encountered

    represent code step 2

    step 3

    step 4

    http://goback/
  • 8/6/2019 05A Compression

    17/102

    05A-compression.fm 17 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    echn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Entropy Coding: Huffman

    Table and example of application to data stream

    note: decoder may auto-detect end-of-symbol in bit-stream

    Other types of Entropy encoding

    Arithmetic Encoding (1)

    most direct application of entropy principles!

    symbols occupy sub-interval of [0,1) according to their probabilities successive coding of symbols cuts out corresponding sub-interval

    in sub-(sub-sub-...) interval chosen so far

    last symbol --> last sub-interval

    chose "arbitrary" ("short") no. in this subinterval --> transmit/store

    requires consideration of "additional" symbol "end-of-word" (below: "!")

    10 11 011 010 11 10 00 10 11 00

    B A DC BA B E A

    symbol code

    AB

    CD

    E

    11

    10

    011010

    00

    E

    http://goback/
  • 8/6/2019 05A Compression

    18/102

    05A-compression.fm 18 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinm

    etz

    ,M

    .Mhlhuser

    Arithmetic Coding Example

    symbols a,b,c,d,!; how to encode "bbd!"??

    transmit/store one (arbitrary) value X in "red" sub-interval

    decoder: X lies in [.1,.6) !1st symbol is "b"; X in [.11, 36) ! 2nd symbol "b"

    process continues. until "!" is decoded

    0 .1 .6 .7 .9 1

    ".1#"$$$$$$$$$$ 0.5 $$$$$$$$$$#".1#"$$0.2$$#".1#

    0 .1 .6 .7 .9 1

    0 .1 .11 .36 .6 .7 .9 1

    0 .1 .6 .7 .9 1

    0 .1 .6 .7 .9 1

    divide [0,1) accordingto probabilities

    restrict to b: [0.1,0.6)

    b: sub-interval [.1,.6)

    of [.1,.6), i.e. [.11,.36)

    d: sub-interval [.7,.9)of [.11,.36)

    !: sub-interval [.9,1)...

    a b c d !

    http://goback/
  • 8/6/2019 05A Compression

    19/102

    05A-compression.fm 19 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    LZW, Code Books, VLWs, VLIs

    LZW (Lempl-Ziv-Welch):

    e.g.: how to code "whoiswho"? might be thought of as follows: start: create table entry for w

    then: create table entry for h, but also for "wh" (multi-symbol),

    then: create table entry for o, but also for "who", (etc. until max-length)

    multi-symbol entries which repeat "often" survive, others: over-written

    Code-Books: used in many compression schemes 3 basic possibilities: (imagine in above example)

    "fixed": all implementations of Codec have (1..n) pre-defined code-books

    "pre-computed":

    encoder-pass1: compute codebook, store/transmit upfront encoder-pass2: encode (compress) data using codebook

    "dynamic": code-book grows / changes during compression (LZW)

    needs "same procedure" for encoder, decoder

    either: "pieces" of code-book are intertwined with code as they

    are generated / changed or: rules are such that (dynamic) codebook contents can be derived

    from encoded (compressed) data by decoder

    VLWs / VLIs: variable length words / integers

    similar to Huffman, but decoder can not detect end-of-symbol

    e.g., 1="0", 2="01", 3="11", ... (useful?? see JPEG etc.)

    http://goback/
  • 8/6/2019 05A Compression

    20/102

    05A-compression.fm 20 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    6. Hybrid Coding: Basic Encoding Steps

    audio:

    video:lossy

    lossless (sometimes lossless)

    lossy

    lossless

    e.g.

    - resolution- frame rate

    e.g.

    - DCT- sub-band

    coding

    e.g.

    - linear- DC, AC

    values

    e.g.

    - runlength- Huffman

    data

    pre-paration

    data

    pro-cessing

    quanti-

    zation

    entropy

    encoding

    source

    data

    compresse

    data

    http://goback/
  • 8/6/2019 05A Compression

    21/102

    05A-compression.fm 21 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    7. JPEG

    JPEG: Joint Photographic Expert Group

    International Standard:

    For digital compression and coding of continuous-tone still images:

    Gray-scale

    Color

    Since 1992Joint effort of:

    ISO/IEC JTC1/SC2/WG10

    Commission Q.16 of CCITT SGVIII

    Compression rate of 1:10 yields reasonable results

    http://goback/
  • 8/6/2019 05A Compression

    22/102

    05A-compression.fm 22 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    JPEG

    Very general compression scheme

    Independence of:

    Image resolution

    Image and pixel aspect ratio

    Color representation

    Image complexity and statistical characteristicsWell-defined interchange format of encoded data

    Implementation in:

    Software only

    Software and hardwareMOTION JPEG for video compression

    Sequence of JPEG-encoded images

    http://goback/
  • 8/6/2019 05A Compression

    23/102

    05A-compression.fm 23 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    JPEG - Compression Steps

    imagepre-

    paration

    imagepro-

    cessing

    quanti-

    zation

    entropy

    encoding

    source

    image

    com-

    pressed

    image

    blockMCU

    pixel

    predictor

    FDCT

    runlength

    Huffman

    Arithm.

    MCU: Minimum Coded UnitFDCT: Forward Discrete Cosine Transformation

    http://goback/
  • 8/6/2019 05A Compression

    24/102

    05A-compression.fm 24 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    JPEG - Image Preparation

    Planes:

    1 N 255 components Ci (e.g., one plane per color) Different resolution of individual components possible

    Pixel resolution:

    8 or 12 bit per pixel in lossy modes

    2 to 16 bit per pixel in lossless mode

    C1

    C2

    CNYi

    Xi

    right

    bottom

    left

    * * *

    *

    *

    * *

    *

    *

    *

    *

    line

    topdata units

    data units: samples in lossless mode, blocks with 8x8 pixels in other modes

    http://goback/
  • 8/6/2019 05A Compression

    25/102

    05A-compression.fm 25 15.March.01

    Scope

    Contents

    h

    ttp:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    JPEG - Image Preparation

    Example 4:2:2 YUV, 4:1:1 YUV, and YUV9 Coding

    Luminance (Y): brightness

    sampling frequency 13.5 MHz

    Chrominance (U, V):

    color differences sampling frequency 6.75 MHz

    http://goback/
  • 8/6/2019 05A Compression

    26/102

    05A-compression.fm 26 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhuse

    r

    JPEG - Image Preparation

    Non-interleaved encoding:

    Interleaved encoding:

    Minimum Coded Unit (MCU):

    Combination of interleaved data units of different components

    top

    rightleft

    bottom

    * * * * * * *

    * * * * * * *

    * * * * * * *

    * * ** * ** * ** * ** * ** * ** * ** * *

    * * ** * ** * ** * *

    * * ** * *

    * *** **

    * *** **

    C1 C2 C3

    http://goback/
  • 8/6/2019 05A Compression

    27/102

    05A-compression.fm 27 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    JPEG - Baseline Mode

    Baseline mode is mandatory for all JPEG implementations:

    Often restricted to certain resolution Often only three planes with predefined color set-up

    Image preparation:

    Step 1a: Pixel resolution --> multiples of p=8 bit

    yields 8 x 8 pixel blocks (data units)

    Step 1b: unsigned --> signed integer (prepare for "oscillation" --> sin/cos)

    ... other steps see below

    Step 4a: zigzag linearization (see below)

    Steps 4b, c, ...: several entropy coding algorithms applied

    tables

    1: imagepre-

    paration

    2: imagepro-

    cessing

    3. quanti-

    zation

    4. entropy

    encodingsource

    image

    com-

    pressed

    image

    FDCT tables tables8x8

    blocks

    I i i U d di f DCT

    http://goback/
  • 8/6/2019 05A Compression

    28/102

    05A-compression.fm 28 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    Intuitive Understanding of DCT

    Fourier-Transform (& FFT "fast" algorithm) known from 1-dimensional:

    cut waveform into pieces (blocks of samples) for each blocks:

    interpret as periodic (infinite) oscillating waveform

    represent as sum of sin/cos waves ai sin t; i=0...(N-1); same for cos

    ai coefficients; a0 = DC (direct current= shift wrt. 0-axis),others: how much of the respective sin or cos wave is part of waveform

    i increasing frequencies (usually N = no. of samples in block)

    DCT in JPEG etc.:

    same idea, but 2-dimensional cos-waves

    cut out square blocks from picture (NxN) cos waves all have independent frequencies in horizontal/vertical direction

    comparable to smooth hills, # of valleys may differ horiz/vert.

    again: interpret sample as periodic (2D) waveform--> represent as sum of (2D) cos wave "hill areas"

    why only cos?? trick: picture swapped around axes

    --> 4fold size --> picture symmetric to axes --> sin parts become zero

    4fold size no problem: 3 parts redundant

    axes have double "weight" (pix. row/col. "0") --> factor Cu/Cv in formula

    JPEG B li M d I P i

    http://goback/
  • 8/6/2019 05A Compression

    29/102

    05A-compression.fm 29 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    JPEG - Baseline Mode: Image Processing

    Forward Discrete Cosine Transformation (FDCT):

    with:

    cu, cv = , for u, v= 0; else cu, cv = 1

    Formula applied to each block for all 0 u, v 7:

    Blocks with 8x8 pixel result in 64 DCT coefficients:

    1 DC-coefficient S00: basic color of the block

    63 AC-coefficients: (likely) zero or near-by zero values

    Different significance of the coefficients:

    DC: most important AC: less important

    Svu

    14---C

    uC

    vs

    yx2x 1+( )u

    16-------------------------------

    2y 1+( )v16

    -------------------------------coscos

    y 0=

    7

    x 0=

    7

    =

    1

    2-------

    JPEG B li M d I P i

    http://goback/
  • 8/6/2019 05A Compression

    30/102

    05A-compression.fm 30 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    JPEG - Baseline Mode: Image Processing

    FDCT transforms:

    blocks into blocks not pixels into pixels

    Example:

    Calculation of S00

    # # # # # # # #

    # # # # # # # ## # # # # # # #

    # # # # # # # #

    # # # # # # # #

    # # # # # # # #

    # # # # # # # #

    # # # # # # # #

    * * * * * * * *

    * * * * * * * ** * * * * * * *

    * * * * * * * *

    * * * * * * * *

    * * * * * * * *

    * * * * * * * *

    * * * * * * * *

    JPEG Baseline Mode: Quantization

    http://goback/
  • 8/6/2019 05A Compression

    31/102

    05A-compression.fm 31 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    form

    atik

    .tu-d

    arms

    tadt.de

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    JPEG - Baseline Mode: Quantization

    Use of quantization tables for the DCT-coefficients:

    Map interval of real numbers to one integer number Allows to use different granularity for each coefficient

    05

    Sc

    Co

    http://www kom e-technik tu-darmstadt de

    http://goback/http://goback/
  • 8/6/2019 05A Compression

    32/102

    A-compressio

    n.fm

    3215.March.01

    cope

    ontents

    http://www.kom.e technik.tu darmstadt.dehttp://www.tk.informatik.tu-darmstadt.de

    R. Steinmetz, M. Mhlhuser

    JPE

    G:QuantizationEffect

    (a)

    http://goback/
  • 8/6/2019 05A Compression

    33/102

    JPEG - Baseline Mode: Entropy Coding

  • 8/6/2019 05A Compression

    34/102

    05A-compression.fm 34 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    JPEG - Baseline Mode: Entropy Coding

    63 AC coefficients:

    Ordering in zig-zag form

    reason: coefficients in lower right corner are likely to be zero Huffman coding of all coefficients:

    Transformation into a codewhere amount of bits depends on frequency of respective value

    Subsequent runlength coding of zeros

    * *** ***** *** ****

    * *** ****

    * *** ****

    * *** ***** *** ***** *** ***** *** ****

    AC77AC70

    DC

    AC07AC01

    JPEG: Details of (one possible) Entropy conding

    http://goback/
  • 8/6/2019 05A Compression

    35/102

    05A-compression.fm 35 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    JPEG: Details of (one possible) Entropy conding

    Treatment of "zig-zag sequence":

    differential coding of DC: DCi stored as "change wrt. DCi-1" assumption: there will rarely be two non-zero AC values in sequence

    --> regard seq. as iteration of non-zero AC-values and zero-runlengths--> sometimes, the zero-runlength will have "length zero"

    code non-zero AC-values as VLIs --> need to transmit VLI-lengths

    (remember: this is not Huffman --> end of code not found by decoder) create pairs (zero-runlength, VLI-length-of-following-non-zero-AC-value)

    these pairs are Huffman encoded

    the very first "pair" is not a pair, but the VLI-length of the (diff.) DC-value

    the block is finally represented as iterationHuffman-encoded pair / VLI-encoded non-zero-AC / Huffman-.... / VLI... / ...preceded by "Huffman-encoded VLI-length / VLI-encoded diff.-DC"

    The next two slides give an example of the DCT coding of a 8x8 block

    JPEG: Sample Compression of 1 Block: 8x8 Matrices

    http://goback/
  • 8/6/2019 05A Compression

    36/102

    05A-compression.fm 36 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    JPEG: Sample Compression of 1 Block: 8x8 Matrices

    1. Typical Pixel Block:

    3. Quantization Matrix:

    139 144 149 153 155 155 155 155

    144 151 153 156 159 156 156 156

    150 155 160 163 158 156 156 156

    159 161 162 160 160 159 159 159

    159 160 161 162 162 155 155 155

    161 161 161 161 160 157 157 157

    162 162 161 163 162 157 157 157

    162 162 161 161 163 158 158 158

    16 11 10 16 24 40 51 61

    12 12 14 19 26 58 60 55

    14 13 16 24 40 57 69 56

    16 17 22 29 51 87 80 62

    18 22 37 56 68 109 103 77

    24 35 55 64 81 104 113 92

    49 64 78 87 103 121 120 101

    72 92 95 98 112 100 103 99

    2. DCT Coefficients:

    4. Quantized Result:

    235.6 1.0 -12.1 -5.2 2.1 -1.7 -2.7 1.3

    -22.6 -17.5 -6.2 -3.2 -2.9 -0.1 0.4 -1.2

    -10.9 -9.3 -1.6 1.5 0.2 -0.9 -0.6 -0.1

    -7.1 -1.9 0.2 1.5 0.9 -0.1 0.0 0.3

    -0.6 -0.8 1.5 1.6 -0.1 -0.7 0.6 1.3

    1.8 -0.2 1.6 -0.3 -0.8 1.5 1.0 -1.0

    -1.3 -0.4 -0.3 -1.5 -0.5 1.7 1.1 -0.8

    -2.6 1.6 -3.8 -1.8 1.9 1.2 -0.6 -0.4

    15 0 -1 0 0 0 0 0

    -2 -1 0 0 0 0 0 0

    -1 -1 0 0 0 0 0 0

    0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0

    JPEG: Sample Compression (contd.)

    http://goback/
  • 8/6/2019 05A Compression

    37/102

    05A-compression.fm 37 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    JPEG: Sample Compression (contd.)

    assume: last DC value was 18 --> encoded difference is 3

    --> only 3, -2, -1 occur as non-zero values.Their VLI-encoding is as follows:

    3 11

    -2 01

    -1 0

    This makes the iteration look as follows (VLIs still represented as integers):

    (2)(3), (1,2)(-2), (0,1)(-1), (0,1)(-1), (0,1)(-1), (2,1)(-1), (0,0) (

  • 8/6/2019 05A Compression

    38/102

    05A-compression.fm 38 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    JPEG 4 Modes of Compression

    lossy sequential DCT-based mode

    (baseline mode)

    expanded lossy DCT-based mode

    lossless mode

    hierarchical mode

    JPEG - Extended Lossy DCT-Based Mode

    http://goback/
  • 8/6/2019 05A Compression

    39/102

    05A-compression.fm 39 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    J G te ded ossy C ased ode

    Pixel resolution 8 to 12 bit

    Sequential image display: Top to bottom

    Good for small images and fast processing

    Progressive image display:

    Coarse to fine

    Good for large and complicated images

    http://goback/
  • 8/6/2019 05A Compression

    40/102

    JPEG - Lossless Mode

  • 8/6/2019 05A Compression

    41/102

    05A-compression.fm 41 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Stein

    me

    tz,

    M.

    Mhlhuse

    r

    Image preparation:

    On pixel basis (2-16 bit/pixel)Image processing:

    Selection of a predictor for each pixel

    Entropy coding:

    Same as lossy mode

    Code of chosen predictor and its difference to the actual value

    c b

    a x

    predictioncode012345

    67

    no predictionx=Ax=Bx=Cx=A+B+Cx=A+((B-C)/2)

    x=B+((A-C)/2)x=(A+B)/2

    JPEG - Hierarchical Mode

    http://goback/
  • 8/6/2019 05A Compression

    42/102

    05A-compression.fm 42 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    for

    ma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Stein

    me

    tz,

    M.

    Mhlhuser

    Coding of each image with several resolutions:

    Image scaling Differential encoding

    First, coded with lowest resolution - image A

    Coded with increasing horizontal & vertical resolution - image A

    Difference between both images is computed - B = A - A (*)

    Iteration for higher resolutions

    Features:

    Requires more storage and higher data rate

    Fast decoding process

    Used for scalable video Similar to Photo-CD (Kodak, proprietary)

    (*) note for all scalable approaches:

    relate higher-res version B (or B) to receivers de-codedlower-res version A (to avoid accumulation of quantization errors)

    8. H.261 and related ITU Standards

    http://goback/
  • 8/6/2019 05A Compression

    43/102

    05A-compression.fm 43 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    for

    ma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Stein

    me

    tz,

    M.

    Mhlhuser

    Video codec for audiovisual services at p x 64kbit/s

    ("p-times-sixtyfour", where p means "multiples-of"): CCITT standard from 1990

    For ISDN

    With p=1,..., 30

    Technical issues:

    Real-time encoding/decoding

    Max. signal delay of 150ms

    Constant data rate

    Implementation in hardware (main goal) and software

    H.261 - Image Preparation

    http://goback/
  • 8/6/2019 05A Compression

    44/102

    05A-compression.fm 44 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    for

    ma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    Fixed source image format

    Image components: Luminance signal (Y)

    Two color difference signals (Cb,Cr)

    Subsampling according to CCIR 601 (4:1:1)

    Quarter Common Intermediate Format (QCIF) resolution: Mandatory Y: 176 x 144 pixel ("pruning" 180-->176)

    At 29.97 frames/s appr. 9.115 Mbps (uncompressed)

    but: encoder may leave out up to 3 frames (--> ~8 fps)

    Common Intermediate format (CIF) resolution: Optional

    Y: 352 x 288 pixel

    At 29.97 frames/s appr. 36.46 Mbps (uncompressed) i.e. ~ 570 * 64kbps

    Layered structure:

    Block of 8 x 8 pixels

    Macroblock of: 4 Y blocks, 1 Cr block, 1 Cb block Group of blocks (GOBs) of 3 x 11 macroblocks

    Picture:

    QCIF picture: 3 GOBs

    CIF picture: 12 GOBs

    CIF: 360*288

    QCIF

    H.261 - Image Compression

    http://goback/
  • 8/6/2019 05A Compression

    45/102

    05A-compression.fm 45 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    for

    ma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    Intraframe coding: yields "reference frame" f0

    DCT w/ same quantization factor for all AC values this factor may be adjusted by loopback filter (see below)

    Interframe coding, motion estimation:

    interframes: f1,f2,f3,... relative to f0 (differential encoding)

    in H.261: intraframes rare (bandwidth!, main application videophone)

    Search of similar macroblock (16x16) in previous image

    Position of this macroblock defines motion vector Search range is up to the implementation:

    max. 15 pixel

    but: motion vector may also always be 0 ("bad" software encoder)

    Frame 1 Frame 2

    note about motion vector mv:- mv points "backwards" in time

    (pos. of object in f- mv related to block,

    not moving object

    H.261 - Image Compression

    http://goback/
  • 8/6/2019 05A Compression

    46/102

    05A-compression.fm 46 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    for

    ma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    Interframe coding, further steps:

    Results: Difference between similar macroblocks

    Motion vector

    Difference of macroblocks:

    DCT if value higher than a specific threshold (hybrid DPCM/DCT!)

    No further processing if value less than this threshold

    Motion vector:

    Components are coded yielding code words of variable length

    Quantization:

    Linear Adaptation of step size ("loopback filter") => ~ constant data rate

    ("leaky bucket": constant 64kbps "drop out";loopback filter: adjust quantization factor if bucket filledabove threshold1 or below threshold 2, respectively)

    Further ITU Video Schemes (H.263, H.3xx)

    http://goback/
  • 8/6/2019 05A Compression

    47/102

    05A-compression.fm 47 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    for

    ma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    H.263

    extension to H.261 max. bitrate: H.263 approx. 2.5 x H.261; lowest bitrates suitable f. modem

    Source Image Formats

    Format PixelsH.261 H.263

    Encoder Decoder Encoder Decoder

    SQCIF 128 x 96 optional required

    QCIF 176 x 144 required required

    CIF 352 x 144 optional optional

    4CIF 704 x 576not defined optional

    16CIF 1408 x 1152

    http://goback/
  • 8/6/2019 05A Compression

    48/102

    H.320, H.32x Family

  • 8/6/2019 05A Compression

    49/102

    05A-compression.fm 49 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.d

    e

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.d

    e

    R.

    Steinme

    tz,

    M.

    Mhlhuser

    H.320 specifies (as overview) videophone for ISDN

    H.310 adapt MPEG 2 for communication over B-ISDN (ATM)

    H.321

    define videoconferencing terminal for B-ISDN (instead of N-ISDN)

    H.322 adapt H.320 for guaranteed QoS LANs (like ISO-Ethernet)

    H.323

    videoconferencing over non-guaranteed LANs

    H.324 Terminal for low bit rate communication (over V.34 Modems)

    http://goback/
  • 8/6/2019 05A Compression

    50/102

  • 8/6/2019 05A Compression

    51/102

    MPEG - Video: Preparation Step

  • 8/6/2019 05A Compression

    52/102

    05A-compression.fm 52 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Stei

    nme

    tz,

    M.

    Mhlhus

    er

    Fixed image format

    Color subsampling: Y, Cr, Cb 4:2:0

    Resolution:

    Should be at most 768 x 576 pixel 8 bit/pixel in each layer (i.e., for Y, Cr, Cb)

    14 pixel aspect ratios

    8 frame rates

    No user defined MCU like JPEG

    No progressive mode like JPEG

    MPEG - Video: Processing Step

    http://goback/
  • 8/6/2019 05A Compression

    53/102

    05A-compression.fm 53 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Stei

    nme

    tz,

    M.

    Mhlhus

    er

    4 types of frames:

    I-frames (intra-coded frames): Like JPEG

    Real-time decoding demands

    P-frames (predictive coded frames):

    Reference to previous I- or P-frames Motion vector

    MPEG does not define how to determine the motion vector

    difference of similar macroblocks is DCT coded

    DC and AC coefficients are runlength coded

    B-frames (bi-directional predictive coded frames):

    Reference to previous and subsequent (I or P) frames

    Interpolation between macro blocks

    D-frames (DC-coded frames):

    Only DC-coefficients are DCT coded For fast forward and rewind

    MPEG - Video Coding

    http://goback/
  • 8/6/2019 05A Compression

    54/102

    05A-compression.fm 54 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhus

    er

    Sequence of I-, P-, and B-frames:

    Sequence:

    Defined by application

    E.g., I B B P B B P B B I B B P B B P B B

    Order of transmission is different: I P B B ...

    I

    P

    B

    BP

    B

    B

    IReferences

    t

    I-Frames (Intracoded)

    P-Frames (Predictive Coded)

    B-Frames (Bidirectionally Coded)

    (D-Frames (DC Coded))

    MPEG - Video: Implications

    http://goback/
  • 8/6/2019 05A Compression

    55/102

    05A-compression.fm 55 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-te

    chn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhus

    er

    Random access

    at I-frames at P-frames: i.e. decode previous I-frame first

    at B-frame: i.e. decode I and P-frames first

    Editing

    decoded data

    loss of quality (encode -> decode -> encode -> ...)

    application of all video editing functions

    encoded data (previous to entropy encoding)

    preservation of quality

    transition effects as function in the DCT domain morphing, non-block conform overlay very difficult

    encoded data

    preservation of quality

    today: too complex, if possible, i.e. need for entropy decoding

    MPEG - Audio Coding: Fundamentals

    http://goback/
  • 8/6/2019 05A Compression

    56/102

    05A-compression.fm 56 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhus

    er

    Masking threshold in the frequence domain

    narrowband random noise

    depends on frequency

    0.02 0.05 0.1 0.2 0.5 1 2 5 10 20frequency (kHz)

    SoundPressureLevel(dB)

    0

    20

    40

    60

    80

    fm

    = 0.25 1 4 kHz

    av

    absolute thresholdof hearing

    masking

    patterns

    MPEG - Audio Coding: Fundamentals

    http://goback/
  • 8/6/2019 05A Compression

    57/102

    05A-compression.fm 57 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    fo

    rma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhus

    er

    Masking in Time Domain

    after and before the event

    depends on (to some extent) amplitude

    -50 50 100 150 ms 0 50 100 150 200

    SLT

    0

    20

    40

    60

    Dt tv

    masker

    simultaneous-pre- post-masking-

    MPEG - Audio Coding

    http://goback/
  • 8/6/2019 05A Compression

    58/102

    05A-compression.fm 58 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.de

    http:/

    /www

    .tk

    .in

    fo

    rma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhus

    er

    Yields: heavily asymmetric codecs!!

    Audio channel:

    Between 32 and 448 kbit/s

    In steps of 16 kbit/s

    Definition of 3 "layers" of quality:

    "higher layer" means "more complex" & "can handle lower layers"

    Layer 1: max. 448 Kbit/s (ca. 1:4 compression, e.g. used as PASC in DCC) Layer 2: max. 384 Kbit/s (ca. 1:6-8, common, e.g. as MUSICAM in DAB)

    Layer 3: max. 320 Kbit/s (ca. 1:10-12, the famous MP3)

    sub-bandcoding

    quanti-zation

    entropy

    psychoacousticalmodel

    32coder &

    framepacking

    controls: how many bits reservedfor which sub-band

    MPEG - Audio Coding

    S li tibl t di f CD DA d DAT

    http://goback/
  • 8/6/2019 05A Compression

    59/102

    05A-compression.fm 59 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.

    de

    http:/

    /www

    .tk

    .in

    fo

    rma

    tik

    .tu-d

    arms

    tadt.de

    R.

    Steinme

    tz,

    M.

    Mhlhus

    er

    Sampling compatible to encoding of CD-DA and DAT:

    Sampling rates: 32 kHz, 44,1 kHz, 48 kHz

    Sampling precision:

    16 bit/sample

    Audio channels:

    Mono (single, 1 channel)

    Stereo (2 channels)

    dual channel mode (independent, e.g., bilingual)

    optional: joint stereo (exploits redundancy and irrelevancy)

    Application Example: DAB Digital Audio Broadcasting uses MPEG layer 2 (compression also known as MUSICAM =

    (Masking pattern adapted Universal Subband Integrated Coding And Multiplexing)

    delays, for VLSI implementation:

    max. 30 ms encoding

    max. 10 ms decoding

    SW codec delays vary for different layers, implementations, computers(rule-of-thumb may be 50/100/150 ms for layer 1/2/3, which makesMP3 rather inappropriate for real-time conversation)

    MPEG - Audio and Video Data Streams

    A di D t St L

    http://goback/
  • 8/6/2019 05A Compression

    60/102

    05A-compression.fm 60 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.

    de

    http:/

    /www

    .tk

    .in

    fo

    rma

    tik

    .tu-d

    arms

    tadt.

    de

    R.

    Steinme

    tz,

    M.

    Mhlhus

    er

    Audio Data Stream Layers:

    1. Frames

    2. Audio access units

    3. Slots

    Video Data Stream Layers:

    1. Video sequence layer

    2. Group of pictures layer

    3. Single picture layer

    4. Slice layer

    5. Macroblock layer

    6. Block layer

    10. MPEG-2

    Follow-Up MPEG Standards

    http://goback/
  • 8/6/2019 05A Compression

    61/102

    05A-compression.fm 61 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.

    de

    http:/

    /www

    .tk

    .in

    fo

    rma

    tik

    .tu-d

    arms

    tadt.

    de

    R.

    Steinme

    tz,

    M.

    Mhlhus

    er

    Follow Up MPEG Standards

    MPEG-2:

    Higher data rates for high-quality audio/video

    Multiple layers and profiles

    MPEG-3

    Initially HDTV

    MPEG-2 scaled up to subsume MPEG-3

    MPEG-4:

    Initially, lower data rates for e.g. mobile communication

    then: focus coding & additional functionalities based on image contents

    MPEG-7 (EC = "experimental core" status):

    Content description

    Basis for search and retrieval

    See section on databasesMPEG-21 (upcoming):

    Framework for multimedia business, delivery... whats missing?

    maybe eCommerce focus --> e.g., security, watermarking?

    MPEG-2

    From MPEG 1 to MPEG 2

    http://goback/
  • 8/6/2019 05A Compression

    62/102

    05A-compression.fm 62 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.

    de

    http:/

    /www

    .tk

    .in

    fo

    rma

    tik

    .tu-d

    arms

    tadt.

    de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhus

    er

    From MPEG-1 to MPEG-2

    Improvement in quality from VCR to TV to HDTV

    No CD-ROM based constraints

    higher data rates

    MPEG-1: about 1.5 Mbit/s

    MPEG-2: 2-100 Mbit/s

    Evolution

    1994: International Standard

    Also later known as H.262

    Prominent role for digital TV in DVB (digital video broadcasting) commercial MPEG-2 realizations available

    MPEG-2 Video

    Inclusion of interlaced video format

    http://goback/
  • 8/6/2019 05A Compression

    63/102

    05A-compression.fm 63 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.

    de

    http:/

    /www

    .tk

    .in

    fo

    rma

    tik

    .tu-d

    arms

    tadt.

    de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhus

    er

    Inclusion of interlaced video format

    Increase resolution, more than CCIR 601Defined as:

    5 profiles (simple, main,..)

    4 levels (with increasing resolution,...)

    Other additional features DCT coefficients may be coded with a non-linear quantization function

    MPEG-2 Video: Scaling

    Motivation

    http://goback/
  • 8/6/2019 05A Compression

    64/102

    05A-compression.fm 64 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt.

    de

    http:/

    /www

    .tk

    .info

    rma

    tik

    .tu-d

    arms

    tadt.

    de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhuser

    Motivation

    analog: continuous decrease in quality if errors occur digital: need for tolerance whenever error occur, i.e scaling

    Option: Spatial scaling

    reduction of resolution

    approach

    image sampled with half resolution, then MPEG algorithms applied,output processed with better FEC (base layer)

    Image decoded, substracted from original, to difference MPEG algorithmsapplied, output processed with worse FEC (enhanced layer)

    Option: Signal to Noise (SNR) scaling noise introduced by

    quantization errors and visible block structures

    approach

    Base layer: DCT output, more significant bits encoded with better FEC

    Enhanced layer: DCT output, less significant bits encoded with worse FEC

    MPEG-2 Video Profiles und Levels

    http://goback/
  • 8/6/2019 05A Compression

    65/102

    05A-compression.fm 65 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt

    .de

    http:/

    /www

    .tk

    .info

    rma

    tik

    .tu-d

    arms

    tadt

    .de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhuser High Level

    1920 pixels/line

    1152 lines

    80 Mbit/s

    100 Mbit/s

    High-1440Level

    1440 pixels/line1152 lines

    60 Mbit/s 60 Mbit/s 80 Mbit/s

    Main Level720 pixels/

    line576 lines

    15 Mbit/

    s

    15 Mbit/

    s

    15 Mbit/

    s

    20 Mbit/s

    Low Level352 pixels/

    line288 lines

    4 Mbit/s 4 Mbit/s

    SimpleProfile

    MainProfile

    SNRScalable

    Profile

    SpatialScalable

    Profile

    HighProfile

    http://goback/
  • 8/6/2019 05A Compression

    66/102

    05A-compression.fm 66 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt

    .de

    http:/

    /www

    .tk

    .info

    rma

    tik

    .tu-d

    arms

    tadt

    .de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhuser

    LEVELSand

    PROFILES

    No B-frames B-frames B-frames B-frames B-frames

    4:2:0 4:2:0 4:2:0 4:2:04:2:0 or

    4:2:2

    NotScalable

    NotScalable

    SNRScalable

    SNR

    Scalable orSpatial

    Scalable

    SNR

    Scalable orSpatial

    Scalable

    MPEG-2 Audio

    (two modest) extension to MPEG-1 audio:

    http://goback/
  • 8/6/2019 05A Compression

    67/102

    05A-compression.fm 67 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt

    .de

    http:/

    /www

    .tk

    .info

    rma

    tik

    .tu-d

    arms

    tadt

    .de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhuser

    ( )

    1) "low sample rate extension" LSE: 1/2 of all MPEG-1 rates: 16, 22.05, 24kHz

    quantization down to 8 bits/sample

    2) "multichannel extension": more channels, i.e. up to

    5 full bandwidth channels (surround system) left and right front

    center (in front)

    left and right back

    "matrixing": rule for backward compatible conversion --> stereo (x, y = 0.71)

    option: +1 "low freq. extension" (LFE) channel for subwoofer

    "multilingual extension": 7 more, i.e. up to 12 channels

    (multiple languages, commentary)

    compatibility with MPEG-1:

    all MPEG-1 audio format can be processed by MPEG-2

    only 3 MPEG-2 audio codecs do not provide backward compatibility

    Left for Stereo Left_f xCenter yLeft_b+ +=

    Right for Stereo Right_f xCenter yRigtht_b+ +=

    MPEG-2 System

    Steps

    http://goback/
  • 8/6/2019 05A Compression

    68/102

    05A-compression.fm 68 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tadt

    .de

    http:/

    /www

    .tk

    .info

    rma

    tik

    .tu-d

    arms

    tadt

    .de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhuser

    p

    1. Audio and video combined to Packetized Elementary Stream (PES)2. PES(es) combined to Program Stream or Transport Stream

    Program stream:

    Error-free environment

    Packets of variable length One single stream with one timing reference

    Transport stream:

    Designed for noisy (lossy) media channels

    Multiplex of various programs with one or more time bases Packets of 188 byte length

    Conversion between Program and Transport Streams possible

    11. MPEG-4

    Goals

    http://goback/
  • 8/6/2019 05A Compression

    69/102

    05A-compression.fm 69 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt

    .de

    http:/

    /www

    .tk

    .info

    rma

    tik

    .tu-d

    arms

    tadt

    .de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhuser

    MPEG-4 (ISO 14496) originally:

    Targeted at systems with very scarce resources

    To support applications like

    Mobile communication

    Videophone and E-mail Max. data rates and dimensions (roughly):

    Between 4800 and 64000 bits/s

    176 columns x 144 lines x 10 frames/s

    Largely covered by H.263, therefore re-orientation: Goal to provide enhanced functionality

    to allow for analysis and manipulation of image contents

    MPEG-4: Schedule for Standardization

    1993 Work started

    1997: Committee Draft

    1998: Final Committee Draft

    1998: Draft International Standard

    1999-2000: International Standard

    MPEG-4: Goals (cont.)

    1: support composite multimedia i.e. find standardized ways to

    http://goback/
  • 8/6/2019 05A Compression

    70/102

    05A-compression.fm 70 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt

    .de

    http:/

    /www

    .tk

    .info

    rma

    tik

    .tu-d

    arms

    tadt

    .de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhuser

    Represent units of aural, visual or audiovisual content "audio/visual objects" or AVOs

    object coding independent of other objects, surroundings and background Compose these objects together

    i.e. creation of compound objects that form audiovisual scenes

    Multiplex and synchronize the data associated with AVOs

    for transportation over network channels providing QoS (Quality-of-Service)

    2: support synthetic objects

    computer-gen. (VR), synthesized (txt2speech), model-based ("face")

    3: support truly interactive applications (more than play/pause/rewind..)

    Interact with the audiovisual scene generated at the decoders site

    Rhubarb

    Audioobject 1 video objects

    12

    3

    Audioobject 2

    Rhubarb

    MPEG-4: Scope

    Definition of

    http://goback/
  • 8/6/2019 05A Compression

    71/102

    05A-compression.fm 71 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt

    .de

    http:/

    /www

    .tk

    .info

    rma

    tik

    .tu-d

    arms

    tadt

    .de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhuser

    System Decoder Model specification for decoder implementations

    Description language

    binary syntax of an AVOs bitstream representation

    scene description information

    Corresponding concepts, tools and algorithms,especially for

    content-based compression of simple and compound audiovisual objects

    manipulation of objects

    transmission of objects

    random access to objects

    animation

    scaling

    error robustness

    e e r

    MPEG-4: Scope (cont.)

    Targeted bit rates for video and audio:

    http://goback/
  • 8/6/2019 05A Compression

    72/102

    05A-compression.fm 72 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt

    .de

    http:/

    /www

    .tk

    .info

    rma

    tik

    .tu-d

    arms

    tadt

    .de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu

    ser

    VLBV core Very Low Bit-rate Video

    5 - 64 Kbit/s

    image sequences with up to CIF resolution and up to 15 frames/s

    Higher-quality video

    64 Kbit/s - 4 Mbit/s quality like digital TV

    Natural audio coding

    2 - 64 Kbit/s

    e e r

    MPEG-4: Video and Image Encoding

    Encoding / decoding of

    http://goback/
  • 8/6/2019 05A Compression

    73/102

    05A-compression.fm 73 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt

    .de

    http:/

    /www

    .tk

    .info

    rma

    tik

    .tu-d

    arms

    tadt

    .de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu

    ser

    Rectangular imagesand video

    coding similar toMPEG-1/2

    motion prediction

    texture coding Images and video of

    arbitrary shape

    as done inconventional

    approach 8x8 DCT or shape-adaptive DCT

    plus coding of shape and transparency information

    Encoder

    Must generate timing information

    speed of the encoder clock = time base

    desired decoding times and/or expiration times

    by using time stamps attached to the stream

    Can specify the minimum buffer resources needed for decoding

    e er

    MPEG-4: Composition of Scenes

    Scene description includes:

    http://goback/
  • 8/6/2019 05A Compression

    74/102

    05A-compression.fm 74 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-tec

    hn

    ik.t

    u-d

    arms

    tadt

    .de

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tadt

    .de

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu

    se

    Tree to define hierarchical relationships between objects

    Objects positions in space and time

    by converting the objects local coordinate system into a global coordinatesystem

    Attribute value selection

    e.g. pitch of sound, color, texture, animation parameters

    Description based on some VRML concepts

    VRML = Virtual Reality Modelling Language

    Interaction with scenes e.g. change viewing point, drag object, start/stop streams, select

    language

    RhubarbRhubarb

    primitive AVO

    compound object

    compound object

    05A-compression

    Scope

    Contents

    http://www.kom.e-technik.tu-darmstadt.dehttp://www.tk.informatik.tu-darmstadt.de

    R. Steinmetz, M. Mhlhuser

    http://goback/http://goback/
  • 8/6/2019 05A Compression

    75/102

    n.fm7515.March.01

    MPE

    G-4:Exampleofa

    Composi

    tion

    ee er

    MPEG-4: Scaling

    Three approaches:

  • 8/6/2019 05A Compression

    76/102

    05A-compression.fm 76 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-tec

    hn

    ik.t

    u-d

    arms

    tadt

    .d

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tadt

    .d

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu

    se

    Spatial scalability decoder displays textures and visual objects at a reduced spatial resolution

    by decoding only a subset of the total bit stream

    32 levels max. for textures and still images

    3 levels max. for video sequences

    Temporal scalability decoder displays video at a reduced temporal resolution

    by decoding only a subset of the total bit stream

    3 levels max.

    Quality scalability

    bitstream is parsed into a number of bit stream layers of different bit-rates

    either during transmission or in the decoder

    subset of the layers still yields a meaningful signal

    Spatial and temporal scaling both for

    Conventional rectangular display and Objects with arbitrary shape

    de

    de e

    r

    MPEG-4: Synthetic Objects

    Visual objects:

    http://goback/
  • 8/6/2019 05A Compression

    77/102

    05A-compression.fm 77 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt.d

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tadt.d

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu

    se

    Human face start object: neutral-expression face

    animated via FDPs and/or FAPs

    FAP (facial anim param): animate current display

    FDP (facial def. param): alternative shape/texture

    Mesh + texture mapping: for 2D & 3D meshes 2D mesh may also be used for human face anim., see above

    only triangular 2D meshes, vertices may be moved (mv!), texture is warped

    e.g. virtual background

    Texture coding for view-dependent applications

    texture, e.g. virt. background; decoder/encoder loop for "minimal" Xmission

    Audio objects:

    Text-to-speech

    speech generation from given text and prosodic parameters

    face animation control Score driven synthesis

    music generation from a score

    more general than MIDI

    Special effects

    de

    de e

    r

    MPEG-4: Layered Networking Architecture

    Display / Recording

    http://goback/
  • 8/6/2019 05A Compression

    78/102

    05A-compression.fm 78 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt.d

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tadt.d

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu

    s

    Access Units

    Adaptation Layer

    FlexMux Layer

    Elementary Streams

    Multiplexed Streams

    Network or Local Storage

    e.g. video or audio framesor scene description commands

    A/V object data+ stream type info, sync. info, QoS req.,...

    e.g. multiple elementary streams

    with similar QoS requirements

    Flexible Multiplexing

    Transport Multiplexing

    - only interface specifiedTransMux Layer

    - layer itself can be any network,e.g. RTP/UDP/IP, AAL5/ATM

    CoDecCoDecCoDecCoDec

    Media

    Coding / Decoding

    de

    de

    ser

    MPEG-4: Layered Networking Architecture (cont.)

    DMIF Delivery Multimedia Integration Framework

    Allows to establish multiple party sessions

    http://goback/
  • 8/6/2019 05A Compression

    79/102

    05A-compression.fm 79 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt.d

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tadt.d

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu

    s Allows to establish multiple party sessions

    interaction with

    remote interactive peers

    broadcast systems

    storage systems

    establishment of channels with specific QoSs and bandwidths Controls

    FlexMux layer

    TransMux layer

    de

    de

    ser

    MPEG-4: Error Handling

    Mobile communication:

    Low bit-rate (< 64 Kbps)

    http://goback/
  • 8/6/2019 05A Compression

    80/102

    05A-compression.fm 80 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt.

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tadt.

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu

    s Low bit rate (< 64 Kbps)

    Error-prone

    MPEG-4 concepts for error handling:

    Resynchronization

    enables receiver to tune in again

    based on markers within bitstream Data recovery

    enables receiver to reconstruct lost data

    encode data in an error-resilient manner

    Error concealment enables receiver to bridge gaps in data

    e.g. by repeating parts of old frames

    .de

    .de

    ser

    12. Wavelets

    Motivation

    http://goback/
  • 8/6/2019 05A Compression

    81/102

    05A-compression.fm 81 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt.

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tadt.

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu

    s

    JPEG / DCT problems:

    DCT not applicable to whole image, but only to small blocks block structure becomes visible at high compression ratios

    Scaling as add-on additional effort

    DCT function is fixed can not be adapted to source data

    Improvements by using Wavelets:

    Transformation of the whole image

    overcomes visible block structures and introduces inherent scaling

    Better identification of which data is relevant to human perception

    higher compression ratio

    .de

    .de

    ser

    Wavelets: Compression / Decompression

    Compressor

    http://goback/
  • 8/6/2019 05A Compression

    82/102

    05A-compression.fm 82 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt

    http:/

    /www

    .tk

    .inf

    orma

    tik

    .tu-d

    arms

    tadt

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu

    The same overall structure as for DCT-based algorithms

    But: important differences in the transformation step

    Quantizer Encoder

    Inverse WaveletTransformation

    DeQuantizer Decoder

    Forward WaveletTransformation

    Decompressor

    t.de

    t.de

    ser

    Wavelets: Fundamental Idea

    Image is transformed into the frequency domain (as in JPEG)

    http://goback/
  • 8/6/2019 05A Compression

    83/102

    05A-compression.fm 83 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt

    http:/

    /www

    .tk

    .inf

    orma

    tik

    .tu-d

    arms

    tadt

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu But: based on Wavelet functions instead of cosine functions

    Advantage: Wavelets are 0 outside a limited interval

    Wavelet automatically relates only to a part of the image Image needs not be splitted into blocks

    "Frequencies"??? :

    Use Wavelet family: {2-j/2*(2-j*x-k)}, j,k Z, being a Wavelet

    cosine:

    ......

    Wavelet e.g.:

    t.de

    t.de

    ser

    Wavelets: Transformation Steps

    "Discrete Wavelet Transformation" (Mallat, 1989)

    http://goback/
  • 8/6/2019 05A Compression

    84/102

    05A-compression.fm 84 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-

    tec

    hn

    ik.t

    u-d

    arms

    tadt

    http:/

    /www

    .tk

    .inf

    orma

    tik

    .tu-d

    arms

    tadt

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu Split image recursively by using high and low pass filters

    c1

    d11

    d12

    d13

    L

    H

    L

    H

    L

    H

    . . .

    read bycolumn (vert. op.)

    L Low Pass + downsampling

    H High Pass + downsampling

    line (horiz.

    read by

    lowerfrequencies

    transformedimage withreduced size

    higherfrequencies

    operations

    t.de

    t.de

    ser

    Wavelets: Transformation Steps (cont.)

    In each step i:

    Three images d i (x=1,2,3):

    http://goback/
  • 8/6/2019 05A Compression

    85/102

    05A-compression.fm 85 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tad

    http:/

    /www

    .tk

    .inf

    orma

    tik

    .tu-d

    arms

    tad

    R.

    Ste

    inme

    tz,

    M.

    Mhlhu x containing the high frequency parts of the image

    representing "details" of the image

    submitted to Wavelet transformation

    or thrown away in case of scaling

    One image ci: containing the lower frequency parts of the image

    representing the original image with less details / at a lower resolution

    submitted to step i+1

    Up to here: 4 images with 1/4 resolution each --> no compression!but again: decorrelation: many coefficients in d-images (close to) zero

    Afterwards:

    Quantization

    Entropy encodingas with DCT

    t.de

    t.de

    ser

    Wavelets: DWT compared with DCT

    Advantages of DWT over DCT:

    No block artefacts

    http://goback/
  • 8/6/2019 05A Compression

    86/102

    05A-compression.fm 86 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tad

    http:/

    /www

    .tk

    .inf

    orma

    tik

    .tu-d

    arms

    tad

    R.

    St

    einme

    tz,

    M.

    Mhlhu

    Inherent scaling

    based on the dxi for i=1,2,3,...

    Lower time complexity for the transformation

    DCT: O(n*logn),

    DWT: O(n) (n=number of values to be transformed) Higher flexibility: Wavelet function can be freely chosen

    (but: howto choose?)

    t.de

    t.de

    ser

    Wavelets: Further Issues

    Edge detection reduces high frequencies:

    First extract detected edges

    http://goback/
  • 8/6/2019 05A Compression

    87/102

    05A-compression.fm 87 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tad

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tad

    R.

    St

    einme

    tz,

    M.

    Mhlhu

    Then apply wavelets to such a filtered image

    Application to video:

    In-2In-1

    Image n

    Imt

    Compute

    differences

    ...In-1 - In-2

    In - In-1

    ...t

    Wavelet

    compressor

    t.de

    t.de

    ser

    13. Fractal Image Compression

    Fractal Geometry was first applied to image generation

    http://goback/
  • 8/6/2019 05A Compression

    88/102

    05A-compression.fm 88 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tad

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tad

    R.

    St

    einme

    tz,

    M.

    Mhlhu

    remember "Mandelbrot" images recursive construction of images

    infinite granularity (i.e. zoom-in), but compact "image data" (formula)(such forms are called fractals)

    Zi

    = RealConst. * Zi-1

    + ComplexConst

    d

    t.de

    d

    t.de

    user

    Use of Fractals for Compression??? Overview (1)

    observation: self-similarities in natural images(clouds, dunes, beaches: zoom-in reveals similar forms as large image)

    idea can nat ral images be described / fractal geometr ??

    http://goback/
  • 8/6/2019 05A Compression

    89/102

    05A-compression.fm 89 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tad

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tad

    R.

    St

    einme

    tz,

    M.

    Mhlh idea: can natural images be described w/ fractal geometry??

    first published by Barnsley & Sloan (88), first impl. 89 by Arnaud Joquin

    Key #1: Iterated Function Systems IFS:

    input (sub-)picture subject to math. transform. of type picture moved, rotated / mirrored, and contracted

    --> all transformations are "contractions"

    Key #2: Banachs Fixed Point Theorem: apply a set Wimg={Wi} of contractions to an image

    after infinitely many applications, a specific image appears

    ... called "attractor" or "fractal"

    this process is independent of initial "start" image!!

    human perception: iteration can stop "pretty soon" (finite no. of iterations)

    Q: how to find Wimg such that attractor is image-to-be-compressed?

    + f

    e

    y

    x

    dc

    ba

    ad

    t.de

    ad

    t.de

    user

    Use of Fractals for Compression??? Overview (2)

    Key #3: Collages Theorem:

    in order to find Wimg

    as above: search Wimg

    such that image is(almost) transformed into itself!

    http://goback/
  • 8/6/2019 05A Compression

    90/102

    05A-compression.fm 90 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tad

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tad

    R.

    St

    einme

    tz,

    M.

    Mhlh img img(almost) transformed into itself!

    First algorithm published (Joaquin):

    partition image into (small, non-overlapping) "range blocks"

    search (larger, overlapping) "domain blocks" which can be"contracted" into range blocks

    for each range block, find domain block and contraction(lots of possibilities!!)

    details / simplifications of Joaquin approach see below

    ad

    t.de

    ad

    t.de

    user

    To apply self-similarity: Image Generation

    Examples

    (from TUD + Univ. Bochum)

    for recursive

    http://goback/
  • 8/6/2019 05A Compression

    91/102

    05A-compression.fm 91 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tad

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tad

    R.

    Steinme

    tz,

    M.

    Mhlh for recursive

    contruction of images

    Sirpinky triangle

    to produce self-

    similar structures infinite steps appliedto different sourceimages lead to sameresult

    known asSirpinski-triangle

    "Grenzwert" alsoknown as attractor

    ad

    t.de

    ad

    t.de

    huser

    To Find Self-Similarities

    affine function allows for

    translation

    rotation

    http://goback/
  • 8/6/2019 05A Compression

    92/102

    05A-compression.fm 92 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e-t

    ec

    hn

    ik.t

    u-d

    arms

    tad

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    tad

    R.

    Steinme

    tz,

    M.

    Mhlh rotation

    scaling

    brightness (/color) adaptation

    IFS:Iterative Function System

    ideally completely self-similar

    example see right

    PIFS:Partitioned Iterative Function System

    real images arenot completly self-similar

    Wimg?

    ad

    t.de

    ad

    t.de

    huser

    Theoretical Basis

    Banachs Fixed Point Theorem:

    Let F be a metrical space

    Let W: FF be a contractive mapping

    http://goback/
  • 8/6/2019 05A Compression

    93/102

    05A-compression.fm 93 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e

    -tec

    hn

    ik.t

    u-d

    arms

    t

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    t

    R.

    Steinme

    tz,

    M.

    Mhlh Let W: FF be a contractive mapping

    i.e. there exists an s, 0

  • 8/6/2019 05A Compression

    94/102

    05A-compression.fm 94 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e

    -tec

    hn

    ik.t

    u-d

    arms

    t

    http:/

    /www

    .tk

    .informa

    tik

    .tu-d

    arms

    t

    R.

    Steinme

    tz,

    M.

    Mhlh Decompression: Apply Wimg iteratively to any image easy

    Stop when error falls below some bound

    Error can be calculated by "Collage Theorem"

    tad

    t.de

    tad

    t.de

    huser

    How to Find Wimg? Joaquins Approach

    Systematic search based on"Partitioned Iterative Function System (PIFS)"

    Partition image into "range blocks" Ri

    http://goback/
  • 8/6/2019 05A Compression

    95/102

    05A-compression.fm 95 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e

    -tec

    hn

    ik.t

    u-d

    arms

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    R.

    Steinme

    tz,

    M.

    Mhl Partition image into range blocks Ri

    8*8 pixel blocks

    non-overlapping

    Consider all "domain blocks" Dj of double size

    16*16 pixel blocks overlapping

    Find for each Ri the most similar Dj consider rotations (0o/90o/180o/270o) and mirroring

    adapt brightness and contrast of Dj to that of Ri

    translation, rotation, mirroring, brightness/contrast adaptationdefine a (partial) affine function

    Combine partial functions to Wimg Compression rate? Example: for each (8*8) range block:

    contraction factor fixed

    3 bit for transformation

    16 bit for domain block coordinates

    12 bit for brightness/contrast adaptation

    --> factor is 8x8x8 : 31= 512:31 (cf. JPEG example)

    http://goback/
  • 8/6/2019 05A Compression

    96/102

  • 8/6/2019 05A Compression

    97/102

    stad

    t.de

    stad

    t.de

    hlh

    user

    14. Basic Audio and Speech Coding Schemes

    Voice encoder/decoder: "vocoder"

    Background ITU driven activities

  • 8/6/2019 05A Compression

    98/102

    05A-compression.fm 98 15.March.01

    Scope

    Contents

    http:/

    /www

    .kom

    .e

    -tec

    hn

    ik.t

    u-d

    arms

    http:/

    /www

    .tk

    .in

    forma

    tik

    .tu-d

    arms

    R.S

    teinme

    tz,

    M.

    Mh

    g ITU driven activities

    G.711: PCM

    with 64 kbps

    G.722 differential PCM (DPCM)

    48, 56, 64 kbps

    G.723

    Multipulse-maximum Likelihood Quatizer (MP-MLQ): 6,3 kbps

    Algebraic Codebook Excitation Linear Prediction (ACELP) 5,3 kbps

    application: speech

    stad

    t.de

    stad

    t.de

    hlh

    user

    Schemes for Speech Coding