EE569 Digital Video Processing 1 Roadmap Introduction Intra-frame coding Inter-frame coding...

EE569 Digital Video Processing EE569 Digital Video Processing

11

RoadmapRoadmap

IntroductionIntroduction

Intra-frame coding Intra-frame coding

Inter-frame codingInter-frame coding

Object-based and scalable video codingObject-based and scalable video coding**– Why object-based?Why object-based?

motion segmentation, shape coding, R-D optimizationmotion segmentation, shape coding, R-D optimization

– scalability issuesscalability issuesSpatial/temporal/quality scalabilitiesSpatial/temporal/quality scalabilities


22

Object-based Video CodingObject-based Video Coding

Waveform-based coding discussed so far uses a simple source model Waveform-based coding discussed so far uses a simple source model (e.g., H.261/263/264, MPEG-1/-2)(e.g., H.261/263/264, MPEG-1/-2)

– Does not consider the semantic content (e.g. objects and their shape) Does not consider the semantic content (e.g. objects and their shape) of the videoof the video

Object-based video coding identifies objects (or regions) in a Object-based video coding identifies objects (or regions) in a video and encodes them. Potential benefits may includevideo and encodes them. Potential benefits may include

– Improved coding efficiencyImproved coding efficiency– Improved visual quality (e.g., no blocking artifacts)Improved visual quality (e.g., no blocking artifacts)– Content descriptionContent description– Content-based interactivityContent-based interactivity

Also called “Also called “content-dependent video coding”content-dependent video coding”– The buzz word for MPEG-4 but less successful than expected (so the The buzz word for MPEG-4 but less successful than expected (so the

important question is to understand why it does not work so well)important question is to understand why it does not work so well)


33

Essential Tasks in Object-based Essential Tasks in Object-based Video CodingVideo Coding

Object/region segmentationObject/region segmentation– Separate pixels based on their color, texture, motion Separate pixels based on their color, texture, motion

characteristicscharacteristics– Closely related to motion detection and segmentationClosely related to motion detection and segmentation– Intrinsically ill-defined and desperate for a breakthroughIntrinsically ill-defined and desperate for a breakthrough

2D shape modeling and coding2D shape modeling and coding– Not all shapes are equally probableNot all shapes are equally probable– Subtle implications into video coding (hidden pitfalls)Subtle implications into video coding (hidden pitfalls)

2D texture modeling and coding2D texture modeling and coding– Extension of existing block-based MCP into region-basedExtension of existing block-based MCP into region-based– Deformable textures (tradeoff between spatial and temporal Deformable textures (tradeoff between spatial and temporal

prediction)prediction)


44

Object/Region SegmentationObject/Region Segmentation

The major challenge in content/object-based codingThe major challenge in content/object-based codingCommon approaches for segmentation in a still Common approaches for segmentation in a still image: gimage: gray-level thresholding, clustering, edge ray-level thresholding, clustering, edge detection, region growing, splitting and mergingdetection, region growing, splitting and mergingObject segmentation in videoObject segmentation in video

– Motion information can be utilized, but how?Motion information can be utilized, but how?– Should we trust more on motion or spatial clues?Should we trust more on motion or spatial clues?


55

Motion-based SegmentationMotion-based Segmentation

Motion-based segmentation: to segment an image using motion Motion-based segmentation: to segment an image using motion informationinformation– We can first We can first estimateestimate the motion field and then the motion field and then segmentsegment the motion field the motion field– However, estimation and segmentation are like two sides of the same coinHowever, estimation and segmentation are like two sides of the same coin

+


66

A Mind-bothering ExampleA Mind-bothering Example

Frame 1 Frame 2

It is easy to convince yourself that tree branches are moving,But how do we know the sky is still? What if it were also movingat the same speed (shouldn’t we observe the same intensity patternsbecause sky is a smooth region)?


77

Implications into Video CodingImplications into Video Coding

True motion representation might be useful to True motion representation might be useful to computer vision and motion perception, but it is not computer vision and motion perception, but it is not indispensable in video codingindispensable in video coding

The fundamental reason lies in the relationship The fundamental reason lies in the relationship between motion representation and video coding: between motion representation and video coding: how to tolerate the uncertainty in motion?how to tolerate the uncertainty in motion?

The same issue remains in object-based image The same issue remains in object-based image coding: how to tolerate the uncertainty in shape? (we coding: how to tolerate the uncertainty in shape? (we will discuss this in more detail later)will discuss this in more detail later)


88

Simplified Segmentation: Change Simplified Segmentation: Change DetectionDetection

To detect the changing parts in a video, from time To detect the changing parts in a video, from time ttii to time to time ttj j , we , we

compute a difference image and threshold the difference by compute a difference image and threshold the difference by TT

otherwise0

|),,(),,(|if1),(

Ttyxftyxfyxd ji

ij

ddijij((x,yx,y) can be further processed, e.g., to remove isolated 1’s, or to group 1’s that are ) can be further processed, e.g., to remove isolated 1’s, or to group 1’s that are

close by to each otherclose by to each other

f f ((x, y, tx, y, tjj))

f f ((x, y, tx, y, tii))


99

Change Detection: Pros and ConsChange Detection: Pros and Cons

Simple to implement; fastSimple to implement; fast

Detects all changesDetects all changes

Detects even unwanted changesDetects even unwanted changes

Positive and negative changes detected (occlusion)Positive and negative changes detected (occlusion)

Difficult to quantify motionDifficult to quantify motion

Requires a static reference frameRequires a static reference frame


1010

Change Detection: An ExampleChange Detection: An Example

Monitor the trafficMonitor the traffic


1111

If without a static reference frameIf without a static reference frame

Background extraction methodsBackground extraction methods– Ad-hoc median detector (your CA#6)Ad-hoc median detector (your CA#6)– To eliminate the impact of (small) moving objects, use To eliminate the impact of (small) moving objects, use

the “the “robust estimatorrobust estimator” approach to iteratively remove ” approach to iteratively remove the outliersthe outliers

– More sophisticated approaches involve the modeling More sophisticated approaches involve the modeling of background by mixture of Gaussian distributions of background by mixture of Gaussian distributions and graph-cut based optimizationand graph-cut based optimization


1212

Simplified Segmentation: Global Simplified Segmentation: Global Motion EstimationMotion Estimation

Planar homography (feature-based)Planar homography (feature-based)– Homogeneous coordinates Homogeneous coordinates – Conditions for planar homographyConditions for planar homography– Homography estimation from feature Homography estimation from feature

correspondencecorrespondence

Hierarchical model-based GME (feature-less)Hierarchical model-based GME (feature-less)– Directly minimize an energy function (the MSE of Directly minimize an energy function (the MSE of

MCP errors)MCP errors)– Solve the optimization problem in a coarse-to-fine Solve the optimization problem in a coarse-to-fine

fashion (more robust and efficient)fashion (more robust and efficient)


1313

Plane HomographyPlane Homography


1414

Model-based GMEModel-based GMETarget function for minimization

Solution: Gauss-Newton method

where

Bergen, J. R., Anandan, P., Hanna, K. J., and Hingorani, R. “Hierarchical Model-Based Motion Estimation.” In Proc. of the Second European Conference on Computer Vision, pp. 237-252, 1992


1515

Multi-resolution GMEMulti-resolution GME


1616

Numerical ExampleNumerical Example


1717

Summary for Change Detection and Summary for Change Detection and Global Motion EstimationGlobal Motion Estimation

Motion segmentation becomes relatively easier Motion segmentation becomes relatively easier to solve when either camera is still or to solve when either camera is still or background objects belong to a planebackground objects belong to a plane

Latest advances include a joint motion Latest advances include a joint motion segmentation and estimation using level-set segmentation and estimation using level-set methods (PDE-based formulation)methods (PDE-based formulation)

Mansouri, A.-R.; Konrad, J., "Multiple motion segmentation with level sets," Image Processing, IEEE Transactions on , vol.12, no.2, pp. 201-220, Feb 2003


1818

2-D Shape Modeling and Coding2-D Shape Modeling and Coding

Bitmap coding: a binary map specifying whether Bitmap coding: a binary map specifying whether or not a pixel belongs to an objector not a pixel belongs to an object

– A special case of the general A special case of the general alpha-mapalpha-map

Contour coding: code only the contour of the Contour coding: code only the contour of the object or the regionobject or the region

– Chain codesChain codes– Polygon approximationPolygon approximation– Spline approximationSpline approximation


1919

Image Matting (Soft segmentation)Image Matting (Soft segmentation)

1),(0),,()],(1[),(),(),( jijiBjijiFjijiX

Not for coding but for interactive editingNot for coding but for interactive editing


2020

2-D Texture Modeling and Coding*2-D Texture Modeling and Coding*

Shape-adaptive DCTShape-adaptive DCT

Shape-adaptive wavelet transformShape-adaptive wavelet transform


2121

RoadmapRoadmap

IntroductionIntroduction

Intra-frame coding Intra-frame coding – Review of JPEGReview of JPEG

Inter-frame codingInter-frame coding– Conditional Replenishment (CR)Conditional Replenishment (CR)– Motion Compensated Prediction (MCP)Motion Compensated Prediction (MCP)

Scalable video codingScalable video coding– 3D subband/wavelet coding and recent trend3D subband/wavelet coding and recent trend


2222

Scalable vs. MulticastScalable vs. Multicast

What is scalable coding?What is scalable coding?

Multicast Scalable coding

foreman.yuv

foreman128k.codforeman256k.codforeman512k.codforeman1024k.cod

foreman.yuv

foreman.cod

1024512256128


2323

Spatial scalabilitySpatial scalability

11 00 11 11 11 …… 00 11 00 11 00 00 00 …… 11 11 00 11 00 00


2424

Temporal scalabilityTemporal scalability

11 00 11 11 11 …… 00 11 00 11 00 00 00 …… 11 11 00 11 00 00

Frame 0,1,2,3,4,5,…Frame 0,2,4,6,8,…Frame 0,4,8,12,…

30Hz15Hz7.5Hz


2525

SNR (Rate) scalabilitySNR (Rate) scalability

11 00 11 11 11 …… 00 11 00 11 00 00 00 …… 11 11 00 11 00 00

PSNRavg=30dB PSNRavg=35dB PSNRavg=40dB

N

iiavg PSNR

NPSNR

1

1PSNRi: PSNR of frame i


2626

Scalability via Bit-Plane CodingScalability via Bit-Plane Coding

A=(a0+a12+a222+ … … +a727)

Least Significant Bit (LSB)

Most Significant Bit (MSB)

Example A=129 sign=+,a0a1a2 …a7=10000001

sign=-, a0a1a2 …a7=00110011 A=-(4+8+64+128)=-204

sign bit


2727

Why DPCM Bad for Scalability?Why DPCM Bad for Scalability?

Base layer

Enhancement Layer 1

Enhancement Layer 2

Ibase P P P

Ienh1

Ienh2

1 2 3 …Frame number

P

P

P

P

P

P

suffer from drifting problemsuffer from coding efficiency loss


2828

Fine Granular Scalability (FGS)Fine Granular Scalability (FGS)

~2dB gap

H.264 with/without FGS optionH.264 with/without FGS option

Foreman sequence (5fps)Foreman sequence (5fps)Base layer

20 kbps

Enhancement layervariable bit-rate

Efficiency gap


2929

3D Wavelet/Subband Coding3D Wavelet/Subband Coding

t

x

y

2D spatial WT+1D temporal WT


3030

Wavelet Video CoderWavelet Video Coder

TemporalWavelet

Transform

TemporalWavelet

Transform

Spatial Wavelet

Transform

Spatial Wavelet

Transform

76

54

32

10

HH

LLL LLHLH

LH

Originalvideoframes

HHH

HHHH

HHHH

HHHH

H

EmbeddedQuantization &Entropy Coding

EmbeddedQuantization &Entropy Coding

[Taubman & Zakhor, 1994] [Ohm, 1994] [Choi & Woods, 1999] [Hsiang & Woods, VCIP ’99] . . . and others


3131

Motion-Adaptive 3D Wavelet TransformMotion-Adaptive 3D Wavelet TransformRecall Haar transform

)12()2()(

)),12()2((2

1)(

nxnxnd

nxnxns

])[(2

1

],[

12

122

nnn

nnn

dWfs

fWfd

Motion-adaptive Haar transform

))()2((2

1)(

),12()2()(

ndnxns

nxnxnd

W,W-1: forward and backward motion vector

lifting-based implementation


3232

LiftingLifting

P U

Even Frames

Synthesis:

Odd Frames

Low Band

High Band11G

10G

P U

Even Frames

Analysis:

Odd Frames

Low Band

High Band

0G

1G

Motion Compensation

[Secker & Taubman, 2001] [Popescu & Bottreau, 2001]


3333

MC Wavelet Coding vs. MC Wavelet Coding vs. H.264/AVCH.264/AVC

2.02.01.81.81.61.61.41.41.21.21.01.00.80.80.60.60.40.40.20.2

3636

3434

3232

3030

2828

2626

2424

2222

2020

3838L

umin

ance

PSN

R (

dB)

Lum

inan

ce P

SNR

(dB

)

bit-rate (Mbps)bit-rate (Mbps)

ScalableScalableMC 5/3 WaveletMC 5/3 Wavelet

Non-scalableNon-scalableH.264/AVCH.264/AVC

Sequence: Mobile CIF

H.264/AVC• high complexity RD control• CABAC• PBBPBBP . . . • 5 prev/3 future reference frames• data courtesy of M. Flierl

[Taubman & Secker, VCIP 2003]courtesy D. Taubman


3434

Wavelet Synthesis with Lossy Wavelet Synthesis with Lossy Motion VectorMotion Vector

d

MC WaveletTransform

MC WaveletTransform

MotionEstimator

MotionEstimator

EmbeddedEncoding

EmbeddedEncoding

EmbeddedEncoding

EmbeddedEncoding

DecoderDecoder

DecoderDecoder

InverseWaveletTransform

InverseWaveletTransform

Videoin

Videoout

d

[Taubman & Secker, ICIP03]

MinimizeJ=D+R

MinimizeJ=D+R


3535

R-D Performance with Lossy R-D Performance with Lossy Motion VectorMotion Vector

BitBit--Rate (kbps)Rate (kbps)

Vid

eo P

SN

R (

dB)

Vid

eo P

SN

R (

dB)

00 200200 400400 606000

800800 10001000 120012002424

2626

2828

3030

3232

3434

3636

3838

4040

Embedded wavelet coefficientsEmbedded wavelet coefficients

Lossless motionLossless motion

Non-embeddedNon-embedded

single-ratesingle-rate

Embedded wavelet coefficientsEmbedded wavelet coefficientsLossy motionLossy motion

CIF ForemanCIF Foreman

[Taubman & Secker, VCIP 2003]courtesy D. Taubman


3636

??

Internet video streaming

Surprising Success of ITU-T Surprising Success of ITU-T Rec. H.263Rec. H.263

What H.263 was developed for . . .

Analog videophone

. . . and what is was used for.


3737

What is Streaming Video?

AccessAccessSWSW

Data path

AccessAccessSWSW

Domain A

Domain B

Domain C

Internet

AccessAccessSWSW

SourceReceiver 2

Receiver 1•Download mode: no delay bound

•Streaming mode: delay bound

cnn.com RealPlayer


3838

Outline• Challenges for quality video transport

• An architecture for video streaming– Video compression– Application-layer QoS control– Continuous media distribution services– Streaming server– Media synchronization mechanisms– Protocols for streaming media

• Summary


3939

Time-varying Available Bandwidth

Data path

AccessAccessSWSW

Domain A

Domain B

AccessAccessSWSW

Source

Receiver

56 kb/s

R>=56 kb/s

R<56 kb/s

cnn.com

RealPlayer

No bandwidth reservation


4040

Time-varying Delay

Data path

AccessAccessSWSW

Domain A

Domain B

AccessAccessSWSW

Source

Receiver

56 kb/s

cnn.com

RealPlayer

Delayed packets regarded as lost


4141

Effect of Packet Loss

Data path

AccessAccessSWSW

Domain A

Domain B

AccessAccessSWSW

Source

ReceiverNo packet loss

Loss of packetsNo retransmission


4242

Unicast vs. Multicast

Unicast Multicast

Pros and cons?


4343

Heterogeneity For Multicast

Domain A

Domain B

Domain C

Internet

Source Receiver 1

Receiver 2

AccessAccessSWSW

AccessAccessSWSW

GatewayGateway

EthernetTelephonenetworks

Receiver 364 kb/s

1 Mb/s

256 kb/s

•Network heterogeneity

•Receiver heterogeneityWhat Quality?

What Quality?


4444



• Summary


4545

Architecture for Video Streaming


4646

Video Compression

Lay

ered

Cod

er D

D

D

+

+

Layer 0

Layer 1

Layer 2 1 Mb/s

256 kb/s

64 kb/s

Layered video encoding/decoding.

D denotes the decoder.


4747

Application of Layered Video

Domain A

Domain B

Domain C

Internet

Source Receiver 1

Receiver 2

AccessAccessSWSW

AccessAccessSWSW

GatewayGateway


Receiver 364 kb/s

1 Mb/s

256 kb/s

IP multicast


4848

Application-layer QoS ControlCongestion control (using rate control): – Source-based, requiresSource-based, requires

rate-adaptive compression or rate-adaptive compression or

rate shapingrate shaping

– Receiver-basedReceiver-based– HybridHybrid

Error control:– Forward error correction (FEC)Forward error correction (FEC)– RetransmissionRetransmission– Error resilient compressionError resilient compression– Error concealmentError concealment


4949

Congestion Control• Window-based vs. rate control (pros and cons?)

Window-based control Rate control


5050

Source-based Rate Control


5151

Video Multicast• How to extend source-based rate control to multicast?• Limitation of source-based rate control in multicast• Trade-off between bandwidth efficiency and service

flexibility


5252

Receiver-based Rate Control

Domain A

Domain B

Domain C

Internet

Source Receiver 1

Receiver 2

AccessAccessSWSW

AccessAccessSWSW

GatewayGateway


Receiver 364 kb/s

1 Mb/s

256 kb/sIP multicast for layered video


5353

Error Control• FEC

– Channel coding– Source coding-based FEC– Joint source/channel coding

• Delay-constrained retransmission• Error resilient compression• Error concealment


5454

Channel Coding


5555

Delay-constrained Retransmission


5656



• Summary


5757


5858

Continuous Media Distribution Services

• Content replication (caching & mirroring)

• Network filtering/shaping/thinning

• Application-level multicast (overlay networks)


5959

Caching• What is caching? • Why using caching? WWW means World Wide Wait?• Pros and cons?


6060



• Summary


6161

Streaming Server• Different from a web server

– Timing constraints– Video-cassette-recorder (VCR) functions (e.g.,

fast forward/backward, random access, and pause/resume).

• Design of streaming servers– Real-time operating system– Special disk scheduling schemes


6262

Media Synchronization• Why media synchronization?• Example: lip-synchronization (video/audio)


6363

Protocols for Streaming Video• Network-layer protocol: Internet Protocol (IP) • Transport protocol:

– Lower layer: UDP & TCP– Upper layer: Real-time Transport Protocol (RTP) &

Real-Time Control Protocol (RTCP)• Session control protocol:

– Real-Time Streaming Protocol (RTSP): RealPlayer– Session Initiation Protocol (SIP): Microsoft

Windows MediaPlayer; Internet telephony


6464

Protocol Stacks


6565

Summary• Challenges for quality video transport

– Time-varying available bandwidth

– Time-varying delay

– Packet loss

• An architecture for video streaming– Video compression

– Application-layer QoS control

– Continuous media distribution services

– Streaming server

– Media synchronization mechanisms

– Protocols for streaming media

EE569 Digital Video Processing 1 Roadmap Introduction Intra-frame coding Inter-frame coding...

Documents

Transcript of EE569 Digital Video Processing 1 Roadmap Introduction Intra-frame coding Inter-frame coding...