1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection...
-
date post
19-Dec-2015 -
Category
Documents
-
view
220 -
download
0
Transcript of 1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection...
1
Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection
Bongsoo Jung, Byeungwoo Jeon
Journal of Visual Communication and Image Representation 2008
2
Outline
Introduction Complexity Analysis Method
Pre Macroblock Mode Selection Adaptive Slice-level Parallelism
Experimental Results Conclusions
3
Introduction
H.264/AVC achieves high coding efficiency Variable block size, multiple reference frame,
quarter-pel motion vector accuracy,etc. High computational complexity
Complexity reduction algorithm Parallel processing
4
Introduction
GOP level Simple but high latency
Frame level Keep coding efficiency, but the dependence am
ong frames limits the thread scalability Slice level
Encode independently but less coding efficiency Macroblock level
High dependency
5
Introduction
MBs in a slice may not have similar computational complexity. Unnecessary extra waiting time in some thr
eads.
slice 0
slice 1
slice 2
slice 3
slice 4
slice 5
slice 6
slice 7
Encoding time
PU0
PU1
PU2
PU3
PU4
PU5
PU6
PU7
6
Main Purpose
Objective Using parallel algorithm to speed up
H.264/AVC encoder Maximize the parallelism efficiency by
distributing the workload equally. Method
Pre processing: Fast MB mode selection Adaptive slice-level parallelism
7
Complexity Analysis
Inter prediction mode of MBs in H.264 Intra prediction mode: 4*4, 16*16
8
Complexity Analysis
The run-time complexity of the H.264/AVC encoder Pentium IV 2.4GHz Foreman_CIF with IPPP structure
9
Pre Macroblock Mode SelectionOverview
Why? High computational complexity of ME in
variable block size Remove unnecessary ME block size and RD
calculation of intra prediction mode This removal leads to
Complexity reduction Workload balancing among slices
10
Pre Macroblock Mode SelectionInter MB mode selection
MC block sizes in video sequence Foreground region : 8*8 or smaller Non-moving region : 16*16
High temporal correlation Check consistency history of block size 16*
16 and zero MV Two measurements
Zero motion consistency (ZMC) Large block consistency (LBC)
11
Pre Macroblock Mode SelectionInter MB mode selection
Zero Motion Consistency (ZMC) Indicates how long a specified block has had
a zero MV consecutively
When a block is encoded in intra mode ZMC is set to 0
t : frame index , ZMC0 = 0,
(n,m;i,j) indicates a 4*4 block at (n,m)
within a MB (i,j)
high value of ZMC
high prob. of belonging
to background region
12
Pre Macroblock Mode SelectionInter MB mode selection
Zero Motion Consistency Score Indicates how likely a MB being a stationary
region
TMOTION : A threshold value
13
Pre Macroblock Mode SelectionInter MB mode selection
Large Block Consistency (LBC) Indicates the number of continuous frames h
aving a 16*16 MC block size at (i,j)th MB
When a block is encoded in intra mode LBC is set to 0
bestModet(i,j) : The best MB mode of the (i,j) MB in tth
frame
LBC0 = 0
14
Pre Macroblock Mode SelectionInter MB mode selection
Large Block Consistency Score Indicates how likely a MB being partitioned in
16*16
TMODE1 ,TMODE2 : Threshold values used to make the
assessment of the LBC
15
Pre Macroblock Mode SelectionInter MB mode selection
A illustration of LBCS
16
Pre Macroblock Mode SelectionInter MB mode selection
Conditional probability of MB modes given ZMCS = High
The other block sizes are very unlikely to appear (less than about 0.04)
Early detect SKIP and P16*16 mode
TMotion = 4
17
Pre Macroblock Mode SelectionInter MB mode selection
Joint conditional probability of given LBCS with ZMCS = Low
A: LBCS = High, B: LBCS = Medium, C: LBCS = Low
TMODE1 = 1, TMODE2 = 4
18
Pre Macroblock Mode SelectionPre selective intra mode selection
High computational load of computing RD costs of intra mode
Comparing temporal correlation with spatial correlation of the current MB prior to frame coding
19
Pre Macroblock Mode SelectionSelective intra mode selection
Mean Absolute Temporal Difference
Mean Absolute Spatial Difference
cx,y : Pixel values at location (x,y) of MB in current frame
rx,y : Pixel values at location (x,y) of MB in previous frame
X, Y : Horizontal and vertical dimensions of a MB
MASDH : The MASD between horizontally
neighboring pixels
MASDV : The MASD between vertically
neighboring pixels
20
Pre Macroblock Mode SelectionSelective intra mode selection
Comparing MATD and MASD to determine whether current MB should calculate RD costs of intra modes
A larger w makes skipping intra mode search easier
A smaller QP will incur more intra modes than a larger QP
w: Weighting factor, currently is set to 0.6
More temporally correlated than spatially correlated
21
Pre Macroblock Mode SelectionMB mode classfication
Decision table of candidate MB mode
A block diagram of MB selection
22
Adaptive Slice-level ParallelismOverview
Characteristic Easy to implement Lower overhead of inter communication a
mong processor unit Good scalability Increase bitrate
Slice boundary is defined on the basis of a fixed number of MBs or fixed number of bits
Hard to decide a slice boundary prior toencoding
23
Adaptive Slice-level ParallelismFixed MB assignment
The number of consecutive MBs in each slice
L : The number of processor units on a multi-core system
M : The total number of MBs in a frame i : Slice index
Example : number of processing unit L = 8, sequence resolution
is CIF (352*288), M = 22*18 = 396
We can assign about 49 MBs to each slice
24
Adaptive Slice-level ParallelismFixed MB assignment
The scheduling of slice-level parallelism in eight processor units
slice 0
slice 1
slice 2
slice 3
slice 4
slice 5
slice 6
slice 7
Encoding time
PU0
PU1
PU2
PU3
PU4
PU5
PU6
PU7
slice 0
slice 1
slice 2
slice 3
slice 4
slice 5
slice 6
slice 7
Encoding time
PU0
PU1
PU2
PU3
PU4
PU5
PU6
PU7
Ideal case Practical case
Bottleneck
25
Adaptive Slice-level ParallelismFixed MB assignment
The imbalance of computational load distribution
Exhaustive Search Method Fast ME / Fast Mode Search
26
Adaptive Slice-level ParallelismFixed MB assignment
Computational load for encoding one frame in slice level parallelism
Computation load of the tth frame by a single processor system
Ctslice(i) : The computational load of ith slice in tth frame
L : Number of slice in a frame
27
Adaptive Slice-level ParallelismFixed MB assignment
The speedup of multiprocessor system over a single processor system
To achieve the maximum speedup Computation loads of each slice should be
as similar as possible Adaptive slice partition method
28
Adaptive Slice-level ParallelismComplexity estimation model
A simple estimation method by utilizing the result of fast MB mode selection
Define the group value g corresponding to the candidate MB modes
29
Adaptive Slice-level ParallelismComplexity estimation model
Complexity model
Ck,CHKIntra(g) : Complexity cost of the kth MB
g : Group index
einter : Estimated complexity cost of inter mode in g = 1
eintra : Complexity cost according to the intra mode check
in g = 1
α1, α2, α3, β1 β2 β3 : Weighting values of complexity cost
30
Adaptive Slice-level ParallelismComplexity estimation model
Relative computational load
4,5.28
3, 3.12
2,2.42
1, 1
)(
33
22
11
0,
gee
gee
gee
gee
gC
IntraInter
IntraInter
IntraInter
IntraInter
IntraCHKk
CHKintra = 0
CHKintra = 1
Assume einter = 1, eintra = 0
α1=2.42, α2=3.12,α3=5.28
4,9.48
3, 7.23
2,.486
1,97.4
)(
33
22
11
1,
gee
gee
gee
gee
gC
IntraInter
IntraInter
IntraInter
IntraInter
IntraCHKk
β1=0.82, β2=0.83, β3=0.84
Assume einter = 1, eintra = 3.97
31
Adaptive Slice-level ParallelismAdaptive MB assignment
The total computational load at the tth frame
Ideal computational load of each slice for the uniform workload distribution
1
0, )(
~ M
kIntraCHKk
t gCC
L
CC
ttslice
~~
32
Adaptive Slice-level ParallelismAdaptive MB assignment
MB assignment of slice
Much better than fixed MB assignment in each slice
33
Adaptive Slice-level ParallelismAdaptive MB assignment
Entire block diagram
34
Experimental ResultsOverview
Performance comparison between proposed MB mode decision and the conventional method
Comparing adaptive slice-level parallelism with fixed slice-level parallelism
35
Experimental ResultsMB mode selection
Average encoding time saving AST[%]
BDPSNR and BDBR are used to measure the performance against FULL_1Slice
FULL_1Slice : Exhaustive methodFMD_1Slice : Fast MB mode search method
36
Experimental ResultsRate distortion curves
37
Experimental Results
R-D performance compared to one slice per frame (FMD_1Slice)
38
Experimental ResultsRate distortion curves
39
Experimental ResultsSlice-level parallelism
Comparing adaptive and fixed slice level parallelism
Speedup
meOverheadTiisliceEncTimeMAX
SliceFMDEncTimeSpeedup
FixedFMDiFixedFMD
_
_
)1_(
meOverheadTiisliceEncTimeMAX
SliceFMDEncTimeSpeedup
AdaptiveFMDiAdaptiveFMD
_
_
)1_(
Encoding time of one slice per frame
by a single processor system
The longest encoding time of a slice using fixed mode
The longest encoding time of a slice using adaptive mode
40
Experimental ResultsSpeedup
41
Conclusions
Proposed a fast MB mode selection using consistency history of block size and a zero MV
Proposed a intra mode selection by comparing the correlation
Using these two schemes, they proposed a new adaptive slice-level parallelism to speed up H.264/AVC encoder
42
Reference
Z. Chen, P. Zhou, Y. He, Fast motion estimation for JVT, JVT Doc.JVT-G016,March 2003.
B. Jeon, J. Lee, Fast mode decision for H.264, JVT-J003, ISO/IEC MPEG and ITU-T VCEG Joint Video Team, (Waikoloa, HI), December 2003.
I. Choi, J. Lee, B. Jeon, Fast coding mode selection with rate-distortion optimization for MPEG-4 Part-10 AVC/H.264, IEEE Trans. Circuits Syst. VideoTechnol. 16 (12) (2006) 1557–1561.