Lecture 4 Predictive Codingmapl.nctu.edu.tw/course/vc_2008/files/lecture4.pdf · Introduction...

Lecture 4 Predictive Coding

Wen-Hsiao Peng, Ph.D

Multimedia Architecture and Processing Laboratory (MAPL)Department of Computer Science, National Chiao Tung University

March 2008

Wen-Hsiao Peng, Ph.D (NCTU CS) MAPL March 2008 1 / 53

Lecture 4 Predictive Coding Background

Introduction



Digital Communication System

Separation of Source and Channel Coding

Point-to-Point Communication, Ergodic Channel, In�nite Delay

Source Coding - Source Statistics and Distortion Measure

Channel Coding - Channel Statistics

Claude Shannon (http://en.wikipedia.org/wiki/Claude Shannon)



Why Video Compression?

Video Raw Data Rate

Data Rate Format Size/Hour

9.1Mbps A: QCIF(176x144),4:2:0,30P (3G Phone) 4GB

37Mbps B: CIF(352x288),4:2:0,30P (VCD) 16GB

166Mbps C: BT.601(720x480),4:2:2,60I (DVD) 74GB

746Mbps D: 1080I(1920x1080),4:2:0,60I (HDTV) 335GB

Transmission Capacity

Network Capacity FPS w/o Compression

V.90 Modem 56kbps (Down), 33kbps (Up) A: 0.18

3G 128-384kbps (Car-Pedestrian ) B: 0.1-0.31

ADSL Typical 1-2Mbps B: 0.8-1.6, C: 0.18-0.36

Ethernet Lan Max. 100Mbps, Typical 10-20Mbps C: 1.8-3.6, D: 0.4-0.8



Approach

Lossless - Reversible, Low Compression

Employ spatiotemporal correlation + non-uniform symbol distribution

Lossy - Irreversible, High Compression

Introduce non-perceivable �delity loss

Most lossy systems include a lossless compression


Lecture 4 Predictive Coding Linear Correlation

Spatiotemporal Correlation

Frame (t-1) Frame (t)

Temporal Correlations

Spatial Correlations



Linear Correlation

Linear Correlation between random variables X and Y

(Y � µY ) = α (X � µX ) , where α can be any value

Correlation Coe�cient

ρX ,Y =cov(X ,Y )

σXσY=

E ((X � µX )(Y � µY ))pE ((X � µX )

2)pE ((Y � µY )

2)

Strength and directions of linear correlation

Linearly Increasing Linearly Decreasing Linearly Uncorrelated Linearly Uncorrelated

ρX ,Y= 1 ρX ,Y= �0.4 ρX ,Y= 0 ρX ,Y= 0



Linear Correlation

Correlation Coe�cient ρX ,Y detects only linear correlation

Given(1) X is uniformly distributed over (�1,+1)(2) Y = X 2 is completely determined by X

) ρX ,Y = 0

Sample Correlation Coe�cient rX ,Y

Estimate ρX ,Y based on N realizations of (X ,Y )

rX ,Y =∑(xi � x)(yi � y)

(N � 1)| {z }SX ,Y

1vuuuut1

(N � 1) ∑(xi � x)2| {z }S2X

1vuuuut1

(N � 1) ∑(yi � y)2| {z }S2Y

rX ,Y ) ρX ,Y as N ) ∞

x and y (sample means) are unbiased estimators of meanA, B, and C are unbiased estimators of variance and covariance



Unbiased Estimation of Variance

S2X =1

(N�1) ∑(xi � x)2 is an unbiased estimator of σ2X

E ( S2X|{z}R.V .

) = E�(X � µX )

2�

S2Y =1

(N�1) ∑(yi � y)2 is an unbiased estimator of σ2Y

E ( S2Y|{z}R.V .

) = E�(Y � µY )

2�

SX ,Y =∑(xi�x)(yi�y)

(N�1) is an unbiased estimator of cov(X ,Y )

E (SX ,Y|{z}R.V .

) = E ((X � µX )(Y � µY ))



Autocorrelation Function

Random signal (process) variation w.r.t. time/space

Detect periodic component and fundamental frequency

Continuous1

Rxx (t1, t2) = E (X (t1)X (t2))

Rxx (τ) = Rxx (t, t + τ) if X (t) is W.S.S.

Discrete

Rxx [n1, n2] = E (X [n1]X [n2])

Rxx [m] = Rxx (n, n+m) if X [n] is W.S.S.

1Wide-Sense Stationary (W.S.S.) Random Signal X (t)

E (X (t)) = E (X (t + τ))8τ 2 R

E (X (t1)X (t2)) = E (X (t1 + τ)X (t2 + τ))8τ 2 R



Estimation of Autocorrelation Function

Estimate Rxx [m] based on a �nite record of a random signal X [n]

v [n], a �nite record of X [n]

v [n] =

�X [n] 0 � n � L� 10 otherwise

Rxx [m] estimator bRxx [m] = 1

L� jmjCvv [m]

Cvv [m] =L�1∑n=0

v [n]v [n+m] =

8><>:L�1�jmj

∑n=0

X [n]X [n+ jmj] jmj � L� 1

0 otherwise

Unbiased Estimator

E (bRxx [m]) = Rxx [m] for jmj � L� 1Wen-Hsiao Peng, Ph.D (NCTU CS) MAPL March 2008 11 / 53


Estimation of Autocorrelation Function

Rxx [mx ,my ] Estimator (2-D Case)

bRxx [mx ,my ] = 1

Lx � jmx j1

Ly � jmy jCvv [mx ,my ]

Cvv [mx ,my ] =

8>>><>>>:Ly�1�jmy j

∑ny=0

Lx�1�jmx j

∑nx=0

X [nx , ny ]X [nx + jmx j , ny + jmy j],

for jmx j � Lx � 1, jmy j � Ly � 10, otherwise

Unbiased Estimator

E (bRxx [mx ,my ]) = Rxx [mx ,my ] for jmx j � Lx � 1, jmy j � Ly � 1Wen-Hsiao Peng, Ph.D (NCTU CS) MAPL March 2008 12 / 53


Autocorrelation Function

Periodic PCM DPCM


Lecture 4 Predictive Coding Di�erential Pulse Code Modulation

Predictive Coding



Di�erential Pulse Code Modulation (DPCM)

Input x [n]

Predictor bxp [n] = f (fex [k ]g)| {z }How to Find f (�)?

Residual e[n] = x [n]� bxp [n]Output ee[n] = e[n] + Quant.z}|{

q[n]

Coded Value ex [n] = bxp [n] + ee[n]Fidelity Loss4[n] = ex [n]� x [n] = q[n]

−

Predictor

+

Prediction

Output

Reconstruction

Quantizer

Coded Value

InputResidual

][nx

][nxp

][~ nx

][ne][~ ne

+

Predictor

Reconstruction

OutputInput][~ nx][~ ne

][nxp

Closed-Loop Structure



PCM vs. DPCM

PCM SNR2

OutputQuantizer

Input][nx ][~ nx

SNRPCM =σ2x [n]

σ2∆1[n]=

σ2x [n]

σ2q1[n]| {z }

Quant.SNR ∝ Bits

DPCM SNR

SNRDPCM =σ2x [n]

σ2∆2[n]=

σ2x [n]

σ2q2[n]

=σ2x [n]

σ2e[n]| {z }Gain

�σ2e[n]

σ2q2[n]| {z }

Quant.SNR ∝ Bits

Optimal Predictor in MSE f �(�) = arg minff (�)g

σ2e[n]

2SNR of (B+1)-bit quantizer σ2x/σ2e =�12 � 22B � σ2x

�/Xm



Open-Loop DPCM

Input x [n]

Predictor bxep [n] = f ( fx [k ]g| {z }Source Data

)

Residual e[n] = x [n]� bxep [n]Output ee[n] = e[n] + Quant.z}|{

q[n]

Coded Value ex [n] = bxdp [n] + ee[n]Fidelity Loss 4[n] = ex [n]� x [n]

q[n] +�bxdp [n]� bxep [n]�| {z }ex [n�1]�x [n�1]

Accumulate 4[n] = q[n] +4[n� 1]

−

Predictor

Prediction

OutputQuantizer

InputResidual

][nx

][ne][~ ne

+

Predictor

Reconstruction

OutputInput][~ nx

Mismatch!][nxd

p][nxep

][~ ne

Open-Loop StructureWen-Hsiao Peng, Ph.D (NCTU CS) MAPL March 2008 17 / 53


Open-Loop vs. Closed-Loop

Open-Loop SNR (Dynamic) bxop [n] � x [n� 1]SNRo =

σ2x [n]

σ24[n]=

σ2x [n]

∑k=n

k=1σ2qo [k ]

=σ2x [n]

n"σ2eo [n]

σ2eo [n]

σ2qo [n]| {z }

∝ Bits

Closed-Loop SNR (Static)3 bxcp [n] � ex [n� 1] = x [n� 1] + qc [n� 1]| {z }Asymptotically Uncorrelated

SNRc =σ2x [n]

σ24[n]=

σ2x [n]

σ2qc [n]

=σ2x [n]

σ2ec [n]

σ2ec [n]

σ2qc [n]

=σ2x [n]

σ2eo [n]

+ σ2qc [n�1]

σ2ec [n]

σ2qc [n]| {z }

∝ Bits

SNRo > SNRc i� nσ2eo [n] < σ2ec [n] (= σ2eo [n] + σ2qc [n�1]) for n > 1

3E (x [n]qc [n]) < E (jqc [n]j2)Wen-Hsiao Peng, Ph.D (NCTU CS) MAPL March 2008 18 / 53


Drifting and Accumulation Errors

Gradually blurred image quality with open-loop control

Closed-Loop Open-Loop (w. Drifting Errors)



Adaptive Loop Control

Adaptive loop control can outperform closed-/open-loop only control

Question: which loop control to use and at what granularity?

Closed-Loop Adaptive Loop Control


Lecture 4 Predictive Coding Linear Minimum Mean Squared Error Prediction

Linear Minimum Mean Squared Error (LMMSE) Predictor

Notion

x [n] : Input (Random)y [n] = W [n] � x [n] :Predictor

d [n] : Desired (Random)e[n] = d [n]� y [n]

Performance Function ξ(�)

ξ(W [n]) = E (je[n]j2) = E (jd [n]� y [n]j2)

Wiener Filter (Impulse Response W �[n])

W �[n] = arg minfW [n]g

ξ(W [n])



Wiener Filter

Transversal Filter (FIR)

x(n) and d(n) are real, stationary process

FIR Filter w = [w0,w1, ..,wN�1]T

Input x(n) = [x [n], x [n� 1], .., x [n�N + 1]]T

y [n] = wTx[n] =N�1∑i=0

wix [n� i ]



Wiener Filter

Prediction Errore[n] = d [n]�wTx[n]

Performance Function ξ(�)

ξ(w) = E�e[n]eT [n]

�= E (d2[n])� 2wTp+wTRw

= E (d2[n])� 2∑lwld [n]x [n� l ]+∑l ∑m

wl rlmwm

where p = E (x[n]d [n]),R =E (x[n]xT [n])

Optimal Filter Tap wo

5ξ(w) = [∂ξ(w)

∂w1,

∂ξ(w)

∂w2, ...,

∂ξ(w)

∂wN�1]T = 0



Wiener Filter

Optimal Filter Tap wo

ξ(w) = E (d2[n])� 2∑lwld [n]x [n� l ]+∑l ∑m

wl rlmwm

∂ξ(w)

∂wi= �2d [n]x [n� i ] + 2∑l

rilwl = 0

5ξ(w) = 0) (1) Rwo = p(2) ξmin = E (d

2[n])�wTo p



Orthogonality

eo [n] is uncorrelated with �lter input fx [n� i ]ji = 0, 1, ..,N � 1g

∂E�e2[n]

�∂wi

= 0) 2E (e[n]∂e[n]

∂wi) = �2E (e[n]x [n� i ]| {z }

Orthogonal

) = 0

eo [n] is uncorrelated with predictor (�lter output) yo [n] = wTo x[n]

E (e[n]y [n]) = 0



LMMSE Prediction of Two Random Variables

LMMSE Predictor of Y based on X

(1) Yp = αX

(2) αo = argminfαgE ((Y � Yp)2)

µX = µY = 0

E (Y ) = E (Yp) = 0,

αo = R�1p =

E (XY )

E (X 2)



LMMSE Prediction of Two Random Variables

µX 6= 0, µY 6= 0 + Without Mean RemovalX 0 and Y 0 have zero mean

E (Y ) 6= E (Yp)| {z } = α0oµX

α0o = R�1p =

E ((X 0 + µX ) (Y0 + µY ))

E�(X 0 + µX )

2� =

E (X 0Y 0) + µXµYE (X 02) + µ2X

µX 6= 0, µY 6= 0 + Mean Removal

(Yp � µY ) = α (X � µX )) Yp = αX + (µY � αµX )

E (Y ) = E (Yp)| {z }αo =

E ((X � µX ) (Y � µY ))

E�(X � µX )

2�



Least-Squares Method

Linear Predictor of Y based on XcYp = αX + β|{z}if µX 6=0,µY 6=0

Least-Squares Method

(α�, β�) = arg minfα,βg∑ (yi � αxi � β)2 =

�SX ,YS2X

,Y � SX ,YSX

X

�,

where X ,Y are sample meansInterpretation (SX ,Y ! cov(X ,Y ), S2X ! σ2X , Y ! µY , X ! µX )�cYp � Y � =

SX ,YS2X| {z }α�

�X � X

�

(Yp � µY ) =E ((X � µX ) (Y � µY ))

E�(X � µX )

2�

| {z }αo

(X � µX )



Forward Prediction

Notion

Forward Prediction fm[n] = x [n]�∑ am,ix [n� i ]

Forward Prediction Error Filter

fm[n] = x [n] � am[n]


Lecture 4 Predictive Coding Building Blocks of Video Compression

Block Diagram of Video Encoder

Spatiotemporal DPCM - Energy ReductionSpatial Transform - Decorrelation, Energy CompactionQuantization - Perceptual-oriented Lossy CompressionEntropy Coding - Symbol Compaction

Bitstream: Control Info., Motion Vectors, Transform Coe�cients


Lecture 4 Predictive Coding Building Blocks of Video Compression

Block Diagram of Video Decoder

Inverse Quantization

Inverse Transform

Spatiotemporal Comp.

Reconstruction


Lecture 4 Predictive Coding Temporal Prediction

Motion Compensated Temporal Prediction

Motion-compensated DPCM along temporal axis

Closed-Loop - construct predictor from previously coded frames

Open-Loop - construct predictor from source frames

Unidirectional Prediction



Block Size of Motion Compensation

Original Zero 16x16

t=0 t=0, No MC t=0 + 16x16 MC

t=9 t=9, Residual t=9, Residual

MAD=32.23 MAD=14.36



Block Size of Motion Compensation

4x4 8x8 16x16

t=0 + 4x4 MC t=0 + 8x8 MC t=0 + 16x16 MC

t=9, Residual t=9, Residual t=9, Residual

MAD=6.00 MAD=9.98 MAD=14.36



Variable Block Size Motion Compensation

Motion compensation can be switched among di�erent block sizes

Extra bits for signaling motion vectors and block partitions

0

Sub-macroblockpartitions

0

1

0 1

0 1

2 3

0

0

1

0 1

0

2

1

3

1 macroblock partition of16*16 luma samples and

associated chroma samples

Macroblockpartitions

2 macroblock partitions of16*8 luma samples and


4 sub-macroblocks of8*8 luma samples and


2 macroblock partitions of8*16 luma samples and


1 sub-macroblock partitionof 8*8 luma samples and


2 sub-macroblock partitionsof 8*4 luma samples and


4 sub-macroblock partitionsof 4*4 luma samples and


2 sub-macroblock partitions of 4*8 luma samples and


Quad-Tree-based Block Sizing Selection of Block Size



Block Size Statistics

Pedestrian (HD) Rush Hour (HD)

4x4 may not be the most favorable one

Temporal prediction may become less bene�cial for HD@High Rate



Block Size Statistics

Pedestrian (QCIF) Rush Hour (QCIF)

Smaller block size is more preferable at high bit rate

Temporal prediction reveals signi�cant coding gain



Occlusion

Better match in Frame (t+1) instead of Frame (t-1)

Extra bits for signaling motion vectors and prediction directions

Bidirectional Prediction



Coding Order vs. Display Order

Bidirectional prediction requires frame bu�er and picture re-ordering

Encoding/Decoding order: 0, 2, 1, 5, 3, 4

Display order: 0, 1, 2, 3, 4, 5



Hierarchical Bidirectional Prediction

Better coding e�ciency as compared to IBP...

Encoding/Decoding order: 0, 4, 2, 1, 3,..

Display order: 0, 1, 2, 3, 4,..



Hierarchical Prediction + Adaptive Loop Control

Open-loop for B pictures but Closed-loop for P pictures

Adaptive loop control can also be applied at macroblock level



Multiple Reference Frames

Di�erent regions can be predicted from di�erent reference frames

Reference frames 0, 2 must be stored

Extra bits for signaling reference pictures and increased bu�er size

Multiple Reference Prediction



Subpixel Motion Compensation

Better match may be found from interpolated sample positions

Extra bits for signaling motion vectors of higher precision

Frame (t-1)

Frame (t)Frame (t-1)

Frame (t)Sampling



Subpixel Motion Compensation

Integer Pel Sub-Pel (Horizontal) Di�. (Horizontal)

Sub-Pel (Vertical) Sub-Pel (Diagonal) Di�. (Diagonal)



Subpixel Interpolation

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

A B

C D

xFracC

yFracC

8-xFracC

8-yFracC

Luma Chroma



Subpixel Interpolation

Half Pel Samples (b, h,m, s), j

b = E � 5F + 20G + 20H � 5I + Jj = aa� 5bb+ 20b+ 20s � 5qq + hh

Quarter Pel Samples (a, c , d , n, f , q, i , k)

a = (G + b)/2f = (b+ j)/2

Quarter Pel Samples (e, g , p, r)

e = (b+ h)/2g = (b+m)/2


Lecture 4 Predictive Coding Statistics of Temporal Prediction

Motion Vector Statistics

Pedestrian (HD) Rush Hour (HD)



Motion Vector Statistics

Pedestrian (QCIF) Rush Hour (QCIF)



Coding Gain

Comparison of Coding Tools

Mobile

Rate (bits/s)

500 1000 1500 2000 2500 3000

PSNR

-Y

171819202122232425262728293031323334353637

16x16+8x16,16x8+8x8+8x4, 4x8, 4x4+Subpixel+NumFrame=5+IBBP+GOP8

Foreman

Rate (bits/s)

500 1000 1500 2000 2500 3000 3500

PSNR

-Y

282930313233343536373839404142434445

16x16+8x16,16x8+8x8+8x4, 4x8, 4x4+Subpixel+NumFrame=5+IBBP+GOP8



Bitstream Composition

Motion Information vs. Compression Ratio

Mobile

QP

15 20 25 30 35 40 45 50

Percentage (%) 0

5

10

15

20

25

30

35

40

45 16x16+16x8, 8x16+8x8+4x8, 8x4, 4x4+Subpixel+NumberFrame=5+IBBP+GOP8

Foreman

QP

15 20 25 30 35 40 45 50

Percentage (%)

0

5

10

15

20

25

30

35

40

45

16x16+16x8, 8x16+8x8+4x8, 8x4, 4x4+Subpixel+NumberFrame=5+IBBP+GOP8



Appendix

∂ (∑l ∑m wl rlmwm) /∂wi

∑l ∑mwl rlmwm = ∑l

wl fl (wi ) = ∑l=N�1l=0,l 6=i wl fl (wi ) + wi fi (wi ),

wherefl (wi ) = ∑m

rlmwm

Take derivative w.r.t. wi

∂fl (wi )/∂wi = rli

∂ (wi fi (wi )) /∂wi = fi (wi ) + riiwi

Thus

∂�∑l ∑m

wl rlmwm

�/∂wi = ∑l=N�1

l=0,l 6=i wl rli +∑m=N�1m=0

rimwm + riiwi

= ∑l=N�1l=0

rilwl +∑m=N�1m=0

rimwm

= 2∑l=N�1l=0

rilwl



Appendix

f (α, β) = ∑ (yi � αxi � β)2

∂f (α, β)/∂α = 0∂f (α, β)/∂β = 0

) ∑ yixi = α ∑ xixi + β ∑ xi∑ yi = α ∑ xi + nβ�

∑ yixi∑ yi

�=

�∑ xixi ∑ xi∑ xi n

� �αβ

��

αβ

�=

1

n∑ (x2i )� (∑ xi )2

�n �∑ xi

�∑ xi ∑ xixi

� �∑ yixi∑ yi

�

α =n∑ yixi �∑ xi ∑ yin∑ (x2i )� (∑ xi )

2 , β =∑ yi ∑ xixi �∑ xi ∑ yixin∑ (x2i )� (∑ xi )

2



References

1 B. Farhang-Boroujeny - Adaptive Filters Theory and Applications

2 A. Oppenheim, et. al - Discrete-Time Signal Processing

3 G. Sullivan, et. al - ISO/IEC 14496 10 Advanced Video Coding 3rdEdition, W6540

4 Y. Wang, et. al - Video Processing and Communications


Lecture 4 Predictive Codingmapl.nctu.edu.tw/course/vc_2008/files/lecture4.pdf · Introduction...

Documents

Transcript of Lecture 4 Predictive Codingmapl.nctu.edu.tw/course/vc_2008/files/lecture4.pdf · Introduction...