EE569 Digital Video Processing 1 Roadmap Introduction Intra-frame coding Inter-frame coding...
-
Upload
sharyl-preston -
Category
Documents
-
view
218 -
download
2
Transcript of EE569 Digital Video Processing 1 Roadmap Introduction Intra-frame coding Inter-frame coding...
EE569 Digital Video Processing EE569 Digital Video Processing
11
RoadmapRoadmap
IntroductionIntroduction
Intra-frame coding Intra-frame coding
Inter-frame codingInter-frame coding
Object-based and scalable video codingObject-based and scalable video coding**– Why object-based?Why object-based?
motion segmentation, shape coding, R-D optimizationmotion segmentation, shape coding, R-D optimization
– scalability issuesscalability issuesSpatial/temporal/quality scalabilitiesSpatial/temporal/quality scalabilities
EE569 Digital Video Processing EE569 Digital Video Processing
22
Object-based Video CodingObject-based Video Coding
Waveform-based coding discussed so far uses a simple source model Waveform-based coding discussed so far uses a simple source model (e.g., H.261/263/264, MPEG-1/-2)(e.g., H.261/263/264, MPEG-1/-2)
– Does not consider the semantic content (e.g. objects and their shape) Does not consider the semantic content (e.g. objects and their shape) of the videoof the video
Object-based video coding identifies objects (or regions) in a Object-based video coding identifies objects (or regions) in a video and encodes them. Potential benefits may includevideo and encodes them. Potential benefits may include
– Improved coding efficiencyImproved coding efficiency– Improved visual quality (e.g., no blocking artifacts)Improved visual quality (e.g., no blocking artifacts)– Content descriptionContent description– Content-based interactivityContent-based interactivity
Also called “Also called “content-dependent video coding”content-dependent video coding”– The buzz word for MPEG-4 but less successful than expected (so the The buzz word for MPEG-4 but less successful than expected (so the
important question is to understand why it does not work so well)important question is to understand why it does not work so well)
EE569 Digital Video Processing EE569 Digital Video Processing
33
Essential Tasks in Object-based Essential Tasks in Object-based Video CodingVideo Coding
Object/region segmentationObject/region segmentation– Separate pixels based on their color, texture, motion Separate pixels based on their color, texture, motion
characteristicscharacteristics– Closely related to motion detection and segmentationClosely related to motion detection and segmentation– Intrinsically ill-defined and desperate for a breakthroughIntrinsically ill-defined and desperate for a breakthrough
2D shape modeling and coding2D shape modeling and coding– Not all shapes are equally probableNot all shapes are equally probable– Subtle implications into video coding (hidden pitfalls)Subtle implications into video coding (hidden pitfalls)
2D texture modeling and coding2D texture modeling and coding– Extension of existing block-based MCP into region-basedExtension of existing block-based MCP into region-based– Deformable textures (tradeoff between spatial and temporal Deformable textures (tradeoff between spatial and temporal
prediction)prediction)
EE569 Digital Video Processing EE569 Digital Video Processing
44
Object/Region SegmentationObject/Region Segmentation
The major challenge in content/object-based codingThe major challenge in content/object-based codingCommon approaches for segmentation in a still Common approaches for segmentation in a still image: gimage: gray-level thresholding, clustering, edge ray-level thresholding, clustering, edge detection, region growing, splitting and mergingdetection, region growing, splitting and mergingObject segmentation in videoObject segmentation in video
– Motion information can be utilized, but how?Motion information can be utilized, but how?– Should we trust more on motion or spatial clues?Should we trust more on motion or spatial clues?
EE569 Digital Video Processing EE569 Digital Video Processing
55
Motion-based SegmentationMotion-based Segmentation
Motion-based segmentation: to segment an image using motion Motion-based segmentation: to segment an image using motion informationinformation– We can first We can first estimateestimate the motion field and then the motion field and then segmentsegment the motion field the motion field– However, estimation and segmentation are like two sides of the same coinHowever, estimation and segmentation are like two sides of the same coin
+
EE569 Digital Video Processing EE569 Digital Video Processing
66
A Mind-bothering ExampleA Mind-bothering Example
Frame 1 Frame 2
It is easy to convince yourself that tree branches are moving,But how do we know the sky is still? What if it were also movingat the same speed (shouldn’t we observe the same intensity patternsbecause sky is a smooth region)?
EE569 Digital Video Processing EE569 Digital Video Processing
77
Implications into Video CodingImplications into Video Coding
True motion representation might be useful to True motion representation might be useful to computer vision and motion perception, but it is not computer vision and motion perception, but it is not indispensable in video codingindispensable in video coding
The fundamental reason lies in the relationship The fundamental reason lies in the relationship between motion representation and video coding: between motion representation and video coding: how to tolerate the uncertainty in motion?how to tolerate the uncertainty in motion?
The same issue remains in object-based image The same issue remains in object-based image coding: how to tolerate the uncertainty in shape? (we coding: how to tolerate the uncertainty in shape? (we will discuss this in more detail later)will discuss this in more detail later)
EE569 Digital Video Processing EE569 Digital Video Processing
88
Simplified Segmentation: Change Simplified Segmentation: Change DetectionDetection
To detect the changing parts in a video, from time To detect the changing parts in a video, from time ttii to time to time ttj j , we , we
compute a difference image and threshold the difference by compute a difference image and threshold the difference by TT
otherwise0
|),,(),,(|if1),(
Ttyxftyxfyxd ji
ij
ddijij((x,yx,y) can be further processed, e.g., to remove isolated 1’s, or to group 1’s that are ) can be further processed, e.g., to remove isolated 1’s, or to group 1’s that are
close by to each otherclose by to each other
f f ((x, y, tx, y, tjj))
f f ((x, y, tx, y, tii))
EE569 Digital Video Processing EE569 Digital Video Processing
99
Change Detection: Pros and ConsChange Detection: Pros and Cons
Simple to implement; fastSimple to implement; fast
Detects all changesDetects all changes
Detects even unwanted changesDetects even unwanted changes
Positive and negative changes detected (occlusion)Positive and negative changes detected (occlusion)
Difficult to quantify motionDifficult to quantify motion
Requires a static reference frameRequires a static reference frame
EE569 Digital Video Processing EE569 Digital Video Processing
1010
Change Detection: An ExampleChange Detection: An Example
Monitor the trafficMonitor the traffic
EE569 Digital Video Processing EE569 Digital Video Processing
1111
If without a static reference frameIf without a static reference frame
Background extraction methodsBackground extraction methods– Ad-hoc median detector (your CA#6)Ad-hoc median detector (your CA#6)– To eliminate the impact of (small) moving objects, use To eliminate the impact of (small) moving objects, use
the “the “robust estimatorrobust estimator” approach to iteratively remove ” approach to iteratively remove the outliersthe outliers
– More sophisticated approaches involve the modeling More sophisticated approaches involve the modeling of background by mixture of Gaussian distributions of background by mixture of Gaussian distributions and graph-cut based optimizationand graph-cut based optimization
EE569 Digital Video Processing EE569 Digital Video Processing
1212
Simplified Segmentation: Global Simplified Segmentation: Global Motion EstimationMotion Estimation
Planar homography (feature-based)Planar homography (feature-based)– Homogeneous coordinates Homogeneous coordinates – Conditions for planar homographyConditions for planar homography– Homography estimation from feature Homography estimation from feature
correspondencecorrespondence
Hierarchical model-based GME (feature-less)Hierarchical model-based GME (feature-less)– Directly minimize an energy function (the MSE of Directly minimize an energy function (the MSE of
MCP errors)MCP errors)– Solve the optimization problem in a coarse-to-fine Solve the optimization problem in a coarse-to-fine
fashion (more robust and efficient)fashion (more robust and efficient)
EE569 Digital Video Processing EE569 Digital Video Processing
1414
Model-based GMEModel-based GMETarget function for minimization
Solution: Gauss-Newton method
where
Bergen, J. R., Anandan, P., Hanna, K. J., and Hingorani, R. “Hierarchical Model-Based Motion Estimation.” In Proc. of the Second European Conference on Computer Vision, pp. 237-252, 1992
EE569 Digital Video Processing EE569 Digital Video Processing
1515
Multi-resolution GMEMulti-resolution GME
EE569 Digital Video Processing EE569 Digital Video Processing
1616
Numerical ExampleNumerical Example
EE569 Digital Video Processing EE569 Digital Video Processing
1717
Summary for Change Detection and Summary for Change Detection and Global Motion EstimationGlobal Motion Estimation
Motion segmentation becomes relatively easier Motion segmentation becomes relatively easier to solve when either camera is still or to solve when either camera is still or background objects belong to a planebackground objects belong to a plane
Latest advances include a joint motion Latest advances include a joint motion segmentation and estimation using level-set segmentation and estimation using level-set methods (PDE-based formulation)methods (PDE-based formulation)
Mansouri, A.-R.; Konrad, J., "Multiple motion segmentation with level sets," Image Processing, IEEE Transactions on , vol.12, no.2, pp. 201-220, Feb 2003
EE569 Digital Video Processing EE569 Digital Video Processing
1818
2-D Shape Modeling and Coding2-D Shape Modeling and Coding
Bitmap coding: a binary map specifying whether Bitmap coding: a binary map specifying whether or not a pixel belongs to an objector not a pixel belongs to an object
– A special case of the general A special case of the general alpha-mapalpha-map
Contour coding: code only the contour of the Contour coding: code only the contour of the object or the regionobject or the region
– Chain codesChain codes– Polygon approximationPolygon approximation– Spline approximationSpline approximation
EE569 Digital Video Processing EE569 Digital Video Processing
1919
Image Matting (Soft segmentation)Image Matting (Soft segmentation)
1),(0),,()],(1[),(),(),( jijiBjijiFjijiX
Not for coding but for interactive editingNot for coding but for interactive editing
EE569 Digital Video Processing EE569 Digital Video Processing
2020
2-D Texture Modeling and Coding*2-D Texture Modeling and Coding*
Shape-adaptive DCTShape-adaptive DCT
Shape-adaptive wavelet transformShape-adaptive wavelet transform
EE569 Digital Video Processing EE569 Digital Video Processing
2121
RoadmapRoadmap
IntroductionIntroduction
Intra-frame coding Intra-frame coding – Review of JPEGReview of JPEG
Inter-frame codingInter-frame coding– Conditional Replenishment (CR)Conditional Replenishment (CR)– Motion Compensated Prediction (MCP)Motion Compensated Prediction (MCP)
Scalable video codingScalable video coding– 3D subband/wavelet coding and recent trend3D subband/wavelet coding and recent trend
EE569 Digital Video Processing EE569 Digital Video Processing
2222
Scalable vs. MulticastScalable vs. Multicast
What is scalable coding?What is scalable coding?
Multicast Scalable coding
foreman.yuv
foreman128k.codforeman256k.codforeman512k.codforeman1024k.cod
foreman.yuv
foreman.cod
1024512256128
EE569 Digital Video Processing EE569 Digital Video Processing
2323
Spatial scalabilitySpatial scalability
11 00 11 11 11 …… 00 11 00 11 00 00 00 …… 11 11 00 11 00 00
EE569 Digital Video Processing EE569 Digital Video Processing
2424
Temporal scalabilityTemporal scalability
11 00 11 11 11 …… 00 11 00 11 00 00 00 …… 11 11 00 11 00 00
Frame 0,1,2,3,4,5,…Frame 0,2,4,6,8,…Frame 0,4,8,12,…
30Hz15Hz7.5Hz
EE569 Digital Video Processing EE569 Digital Video Processing
2525
SNR (Rate) scalabilitySNR (Rate) scalability
11 00 11 11 11 …… 00 11 00 11 00 00 00 …… 11 11 00 11 00 00
PSNRavg=30dB PSNRavg=35dB PSNRavg=40dB
N
iiavg PSNR
NPSNR
1
1PSNRi: PSNR of frame i
EE569 Digital Video Processing EE569 Digital Video Processing
2626
Scalability via Bit-Plane CodingScalability via Bit-Plane Coding
A=(a0+a12+a222+ … … +a727)
Least Significant Bit (LSB)
Most Significant Bit (MSB)
Example A=129 sign=+,a0a1a2 …a7=10000001
sign=-, a0a1a2 …a7=00110011 A=-(4+8+64+128)=-204
sign bit
EE569 Digital Video Processing EE569 Digital Video Processing
2727
Why DPCM Bad for Scalability?Why DPCM Bad for Scalability?
Base layer
Enhancement Layer 1
Enhancement Layer 2
Ibase P P P
Ienh1
Ienh2
1 2 3 …Frame number
P
P
P
P
P
P
suffer from drifting problemsuffer from coding efficiency loss
EE569 Digital Video Processing EE569 Digital Video Processing
2828
Fine Granular Scalability (FGS)Fine Granular Scalability (FGS)
~2dB gap
H.264 with/without FGS optionH.264 with/without FGS option
Foreman sequence (5fps)Foreman sequence (5fps)Base layer
20 kbps
Enhancement layervariable bit-rate
Efficiency gap
EE569 Digital Video Processing EE569 Digital Video Processing
2929
3D Wavelet/Subband Coding3D Wavelet/Subband Coding
t
x
y
2D spatial WT+1D temporal WT
EE569 Digital Video Processing EE569 Digital Video Processing
3030
Wavelet Video CoderWavelet Video Coder
TemporalWavelet
Transform
TemporalWavelet
Transform
Spatial Wavelet
Transform
Spatial Wavelet
Transform
76
54
32
10
HH
LLL LLHLH
LH
Originalvideoframes
HHH
HHHH
HHHH
HHHH
H
EmbeddedQuantization &Entropy Coding
EmbeddedQuantization &Entropy Coding
[Taubman & Zakhor, 1994] [Ohm, 1994] [Choi & Woods, 1999] [Hsiang & Woods, VCIP ’99] . . . and others
EE569 Digital Video Processing EE569 Digital Video Processing
3131
Motion-Adaptive 3D Wavelet TransformMotion-Adaptive 3D Wavelet TransformRecall Haar transform
)12()2()(
)),12()2((2
1)(
nxnxnd
nxnxns
])[(2
1
],[
12
122
nnn
nnn
dWfs
fWfd
Motion-adaptive Haar transform
))()2((2
1)(
),12()2()(
ndnxns
nxnxnd
W,W-1: forward and backward motion vector
lifting-based implementation
EE569 Digital Video Processing EE569 Digital Video Processing
3232
LiftingLifting
P U
Even Frames
Synthesis:
Odd Frames
Low Band
High Band11G
10G
P U
Even Frames
Analysis:
Odd Frames
Low Band
High Band
0G
1G
Motion Compensation
[Secker & Taubman, 2001] [Popescu & Bottreau, 2001]
EE569 Digital Video Processing EE569 Digital Video Processing
3333
MC Wavelet Coding vs. MC Wavelet Coding vs. H.264/AVCH.264/AVC
2.02.01.81.81.61.61.41.41.21.21.01.00.80.80.60.60.40.40.20.2
3636
3434
3232
3030
2828
2626
2424
2222
2020
3838L
umin
ance
PSN
R (
dB)
Lum
inan
ce P
SNR
(dB
)
bit-rate (Mbps)bit-rate (Mbps)
ScalableScalableMC 5/3 WaveletMC 5/3 Wavelet
Non-scalableNon-scalableH.264/AVCH.264/AVC
Sequence: Mobile CIF
H.264/AVC• high complexity RD control• CABAC• PBBPBBP . . . • 5 prev/3 future reference frames• data courtesy of M. Flierl
[Taubman & Secker, VCIP 2003]courtesy D. Taubman
EE569 Digital Video Processing EE569 Digital Video Processing
3434
Wavelet Synthesis with Lossy Wavelet Synthesis with Lossy Motion VectorMotion Vector
d
MC WaveletTransform
MC WaveletTransform
MotionEstimator
MotionEstimator
EmbeddedEncoding
EmbeddedEncoding
EmbeddedEncoding
EmbeddedEncoding
DecoderDecoder
DecoderDecoder
InverseWaveletTransform
InverseWaveletTransform
Videoin
Videoout
d
[Taubman & Secker, ICIP03]
MinimizeJ=D+R
MinimizeJ=D+R
EE569 Digital Video Processing EE569 Digital Video Processing
3535
R-D Performance with Lossy R-D Performance with Lossy Motion VectorMotion Vector
BitBit--Rate (kbps)Rate (kbps)
Vid
eo P
SN
R (
dB)
Vid
eo P
SN
R (
dB)
00 200200 400400 606000
800800 10001000 120012002424
2626
2828
3030
3232
3434
3636
3838
4040
Embedded wavelet coefficientsEmbedded wavelet coefficients
Lossless motionLossless motion
Non-embeddedNon-embedded
single-ratesingle-rate
Embedded wavelet coefficientsEmbedded wavelet coefficientsLossy motionLossy motion
CIF ForemanCIF Foreman
[Taubman & Secker, VCIP 2003]courtesy D. Taubman
EE569 Digital Video Processing EE569 Digital Video Processing
3636
??
Internet video streaming
Surprising Success of ITU-T Surprising Success of ITU-T Rec. H.263Rec. H.263
What H.263 was developed for . . .
Analog videophone
. . . and what is was used for.
EE569 Digital Video Processing EE569 Digital Video Processing
3737
What is Streaming Video?
AccessAccessSWSW
Data path
AccessAccessSWSW
Domain A
Domain B
Domain C
Internet
AccessAccessSWSW
SourceReceiver 2
Receiver 1•Download mode: no delay bound
•Streaming mode: delay bound
cnn.com RealPlayer
EE569 Digital Video Processing EE569 Digital Video Processing
3838
Outline• Challenges for quality video transport
• An architecture for video streaming– Video compression– Application-layer QoS control– Continuous media distribution services– Streaming server– Media synchronization mechanisms– Protocols for streaming media
• Summary
EE569 Digital Video Processing EE569 Digital Video Processing
3939
Time-varying Available Bandwidth
Data path
AccessAccessSWSW
Domain A
Domain B
AccessAccessSWSW
Source
Receiver
56 kb/s
R>=56 kb/s
R<56 kb/s
cnn.com
RealPlayer
No bandwidth reservation
EE569 Digital Video Processing EE569 Digital Video Processing
4040
Time-varying Delay
Data path
AccessAccessSWSW
Domain A
Domain B
AccessAccessSWSW
Source
Receiver
56 kb/s
cnn.com
RealPlayer
Delayed packets regarded as lost
EE569 Digital Video Processing EE569 Digital Video Processing
4141
Effect of Packet Loss
Data path
AccessAccessSWSW
Domain A
Domain B
AccessAccessSWSW
Source
ReceiverNo packet loss
Loss of packetsNo retransmission
EE569 Digital Video Processing EE569 Digital Video Processing
4242
Unicast vs. Multicast
Unicast Multicast
Pros and cons?
EE569 Digital Video Processing EE569 Digital Video Processing
4343
Heterogeneity For Multicast
Domain A
Domain B
Domain C
Internet
Source Receiver 1
Receiver 2
AccessAccessSWSW
AccessAccessSWSW
GatewayGateway
EthernetTelephonenetworks
Receiver 364 kb/s
1 Mb/s
256 kb/s
•Network heterogeneity
•Receiver heterogeneityWhat Quality?
What Quality?
EE569 Digital Video Processing EE569 Digital Video Processing
4444
Outline• Challenges for quality video transport
• An architecture for video streaming– Video compression– Application-layer QoS control– Continuous media distribution services– Streaming server– Media synchronization mechanisms– Protocols for streaming media
• Summary
EE569 Digital Video Processing EE569 Digital Video Processing
4646
Video Compression
Lay
ered
Cod
er D
D
D
+
+
Layer 0
Layer 1
Layer 2 1 Mb/s
256 kb/s
64 kb/s
Layered video encoding/decoding.
D denotes the decoder.
EE569 Digital Video Processing EE569 Digital Video Processing
4747
Application of Layered Video
Domain A
Domain B
Domain C
Internet
Source Receiver 1
Receiver 2
AccessAccessSWSW
AccessAccessSWSW
GatewayGateway
EthernetTelephonenetworks
Receiver 364 kb/s
1 Mb/s
256 kb/s
IP multicast
EE569 Digital Video Processing EE569 Digital Video Processing
4848
Application-layer QoS ControlCongestion control (using rate control): – Source-based, requiresSource-based, requires
rate-adaptive compression or rate-adaptive compression or
rate shapingrate shaping
– Receiver-basedReceiver-based– HybridHybrid
Error control:– Forward error correction (FEC)Forward error correction (FEC)– RetransmissionRetransmission– Error resilient compressionError resilient compression– Error concealmentError concealment
EE569 Digital Video Processing EE569 Digital Video Processing
4949
Congestion Control• Window-based vs. rate control (pros and cons?)
Window-based control Rate control
EE569 Digital Video Processing EE569 Digital Video Processing
5151
Video Multicast• How to extend source-based rate control to multicast?• Limitation of source-based rate control in multicast• Trade-off between bandwidth efficiency and service
flexibility
EE569 Digital Video Processing EE569 Digital Video Processing
5252
Receiver-based Rate Control
Domain A
Domain B
Domain C
Internet
Source Receiver 1
Receiver 2
AccessAccessSWSW
AccessAccessSWSW
GatewayGateway
EthernetTelephonenetworks
Receiver 364 kb/s
1 Mb/s
256 kb/sIP multicast for layered video
EE569 Digital Video Processing EE569 Digital Video Processing
5353
Error Control• FEC
– Channel coding– Source coding-based FEC– Joint source/channel coding
• Delay-constrained retransmission• Error resilient compression• Error concealment
EE569 Digital Video Processing EE569 Digital Video Processing
5656
Outline• Challenges for quality video transport
• An architecture for video streaming– Video compression– Application-layer QoS control– Continuous media distribution services– Streaming server– Media synchronization mechanisms– Protocols for streaming media
• Summary
EE569 Digital Video Processing EE569 Digital Video Processing
5858
Continuous Media Distribution Services
• Content replication (caching & mirroring)
• Network filtering/shaping/thinning
• Application-level multicast (overlay networks)
EE569 Digital Video Processing EE569 Digital Video Processing
5959
Caching• What is caching? • Why using caching? WWW means World Wide Wait?• Pros and cons?
EE569 Digital Video Processing EE569 Digital Video Processing
6060
Outline• Challenges for quality video transport
• An architecture for video streaming– Video compression– Application-layer QoS control– Continuous media distribution services– Streaming server– Media synchronization mechanisms– Protocols for streaming media
• Summary
EE569 Digital Video Processing EE569 Digital Video Processing
6161
Streaming Server• Different from a web server
– Timing constraints– Video-cassette-recorder (VCR) functions (e.g.,
fast forward/backward, random access, and pause/resume).
• Design of streaming servers– Real-time operating system– Special disk scheduling schemes
EE569 Digital Video Processing EE569 Digital Video Processing
6262
Media Synchronization• Why media synchronization?• Example: lip-synchronization (video/audio)
EE569 Digital Video Processing EE569 Digital Video Processing
6363
Protocols for Streaming Video• Network-layer protocol: Internet Protocol (IP) • Transport protocol:
– Lower layer: UDP & TCP– Upper layer: Real-time Transport Protocol (RTP) &
Real-Time Control Protocol (RTCP)• Session control protocol:
– Real-Time Streaming Protocol (RTSP): RealPlayer– Session Initiation Protocol (SIP): Microsoft
Windows MediaPlayer; Internet telephony
EE569 Digital Video Processing EE569 Digital Video Processing
6565
Summary• Challenges for quality video transport
– Time-varying available bandwidth
– Time-varying delay
– Packet loss
• An architecture for video streaming– Video compression
– Application-layer QoS control
– Continuous media distribution services
– Streaming server
– Media synchronization mechanisms
– Protocols for streaming media