[IEEE IEEE Globecom 2006 - San Francisco, CA, USA (2006.11.27-2006.12.1)] IEEE Globecom 2006 -...

H.264 Bit-rate Control Using the 3-D PerceptualQuantization Modeling

Chung-Ming Huang and Chung-Wei LinLaboratory of Multimedia Mobile Networking

Department of Computer Science and Information EngineeringNational Cheng Kung University, Tainan, Taiwan, R.O.C.

Correspondence: [email protected]

Abstract— Bit-rate control has a critical influence in videocoding and multimedia streaming. This paper proposes a novelH.264 bit-rate control using a 3-D perceptual quantizationmodeling (PQrc), including two major encoding modules: theperceptual frame-level bit-allocation and the fast macroblock-level quantizer decision. The frame-level budget bit dependson the frame complexity (ε) and buffer fullness, in which ε isweighted by the predicted mean-absolute-difference (MAD) andthe just-noticeable-difference (JND) PSNR. Considering the MB-level quantizer decision, the 3-D bits-complexity-quantization,which is denoted as B.C.Q., model is established, in which theB.C.Q. curve’s tangent slope is a piece of unique informationto find a proper quantizer. In comparison with the latest H.264JM10.2, our experiment results show that the proposed PQrccan improve the SNR quality and keep the stable buffer fullnesswith less computational cost.keywords: H.264, bit-rate control, just-noticeable-difference(JND) PSNR, weighted least square estimation.

I. INTRODUCTION

Recently, IP-based multimedia applications have becomepopular increasingly because of mature multimedia process-ing techniques, e.g., MPEG-1/2 for on-demand streamingvideo, H.261/H.263 for real-time video phony/conferencingand MPEG-4/H.264 for mobile multimedia communication.These video coders support several elementary compressionfunctions, e.g., DCT, motion estimation/compensation, simplebit-rate control, etc. Among them, bit-rate control can adaptthe transmission rate and the quantizer to the network conges-tion for producing the proper amount of output bit-streams.

Three main issues of bit-rate control in the Internet multi-media streaming aspect are as follows:

(1) The scalable rate-adaptation mechanism: Upon the framelayer, the rate-controller can allocate adequate budget bitsfor each upcoming frame or skip video data to achievethe steady buffer fullness. It avoids DCT coefficients beingquantized by the maximum quantizer when allotted budget bitsare exhausted. For example, TMN8 utilized the plain linearfunction, i.e., b̃ = Rc/fr − ∆, where Rc is channel rates andfr is encoding frame rates, to assign budget bits (̃b) and steadybuffer fullness by minus a regulator ∆ [3][9].

(2) The accurate quantization mechanism: Upon the mac-roblock (MB) layer, each MB quantizer must be decided basedon the residual bit-rate and the MB complexity. The quantizerdecision is the core of the bit-rate control that directly affects

encoding bit-rates and perceptual qualities. In order to have agood fit of DCT coefficient distribution, He, et al. proposedthe novel R − ρ bit-rate control model, in which R is thebit-rate and ρ is the percentage of zeros among the quantizedDCT coefficients in one macroblcok [8]. He, et al. predictedthe ρ-change between adjacent macroblocks to determine eachproper quantizer.

(3) The steady buffer regulation: The ultimate object ofthe two aforementioned techniques is to achieve steady bufferfullness. In a time-varying channel environment, e.g., thewireless network environment, when the buffer is overflow,the rate-controller drops later frames until the buffer crisis isrelieved. The frame-dropping results in temporal degradationin terms of motion jerkiness perceived by human beings.The stable buffer regulation can smooth the multimedia bit-stream traffic and improve the playback quality. Two well-known buffering approaches that were proposed for handlingthe stable buffer control are (1) the fluid flow traffic modeland (2) the hypothetical reference decoder model [7][9].

In this paper, we propose the H.264 bit-rate control usingthe 3-D perceptual quantization modeling, which is denotedas PQrc. The frame complexity (ε) and the buffer fullness(ω) are mainly considered for the frame-level bit-allocation inPQrc, in which ε is weighted based on the predicted MADand the PSNR variation. In the traditional H.264 encodingscheme, the MAD value is an essential variable for ratecontrol operations. But the MAD value is only available afterthe rate-distortion optimization (RDO) [1][5]. To address this”chicken and egg dilemma” problem, the rate control should beable to estimate MAD or quantizers in advance. Comparingwith previous researches [2], MAD can be predicted usinga dynamic energy table that records the energy (complexity)transition between adjacent frames in our proposed PQrccontrol scheme. The expectation of the energy transition can bepredicted based on previously encoded video clips. Besides,the just-noticeable-difference (JND) based PSNR is adoptedto improve the accuracy of the frame complexity evaluationin consideration of the human visual model (HVM). Throughoff-line regressive analyses, the correspondence function ofbit-rate (b) and complexity (ε) can be modeled, i.e., b = c0ε

2+c1ε+ c2, where c0, c1 and c2 are model parameters, and theneach frame’s budget-bit (̃b) can be predicted when ε is known.In the macroblock-layer rate control, the 3-D bits-complexity-quantizer, which is denoted as B.C.Q., model is proposed

©1-4244-0357-X/06/$20.00 2006 IEEEThis full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2006 proceedings.

to compute the MB quantizer. The curve tangents in theB.C.Q. model will be useful information on the MB quantizer(QPmb) decision. When the MB complexity (εmb) and bit-quota (̃bmb) are known, we can find a QPmb that makes thetangent slope of its B.C.Q. curve be approximate to b̃mb/εmb.Besides, the B.C.Q. model will be updated continuously fornewly generated video clips using the weighted least squareestimation.

The rest of this paper is organized as follows. SectionII surveys traditional JVT H.264 bit-rate control schemes.Section III introduces the proposed perceptual frame-level bit-allocation in H.264. Section IV describes the quantizationparameter decision using the 3-D quantization modeling. Sec-tion V shows experiment results. Section VI has conclusionremarks.

II. PRELIMINARY

H.264/AVC is the emerging video coding standard jointlyproposed by experts in the ITU-T and MPEG committee[6][11]. The original H.264 feature is the usage of sevenMB coding modes, e.g., SKIP, INTER16×16, INTER16×8,INTER8×8, INTER8×16, INTRA16×16 and INTR4×4, suchthat the best MB’s spatial and temporal presentations canbe selected. In the H.264 reference software [12], the rate-distortion optimization (RDO) using the Lagrangian multipliermethod is adopted for MB mode decisions. Unfortunately, ifthe RDO function is enabled, the implementation of H.264bit-rate control becomes very difficult than other standards,which is called ”the chicken and egg dilemma” problem.Thus, the main challenge of H.264 bit-rate control is how toestimate MAD or the quantizer based on the information ofprior coded frames. The proposed PQrc has major revisionsfor R-Q model and frame complexity estimation componentsin the H.264 rate-controller [13]. The detail diversities arehighlighted as follows.Complexity Estimation: The MAD of the motion compensa-tion prediction error is usually treated as the frame complexity,i.e.,

MAD =∑

(x,y)∈all pixels|f(x,y) − f̂(x,y)| (1)

where f(x,y) is the luminance value at the coordinate (x,y)in the original frame, f̂(x,y) is the ones in the reconstructedframe.

In comparison with the frame complexity estimation ofH.264 JM, the proposed PQrc constructs a table to recordthe energy (complexity) transition for encoded videos dynami-cally. According to the table data and JND-based PSNR, PQrccan predict and refine the estimated frame complexity, and thendetermine frame-level budget bits accordingly.Rate-Quantization Model: The R-Q model is used for cal-culating the quantizer (QP ) based on the estimated MAD(MADe) and residual budget bits (Rr). Similar to theMPEG-4 scalable rate control (SRC) [10], the quadratic rate-quantization function adopted in H.264 can be modeled asfollows:

Rr =c1 ·MADe

QP+c2 ·MADe

QP 2(2)

where c1 and c2 are model parameters.In PQrc, the MB quantizer is determined using the 3-

D B.C.Q. model rather than the traditional R-Q model. Weobserve that the B.C.Q. curve’s tangent slope is a pieceof unique information that is suitable to calculate the MBquantizer (QPmb). When the MB complexity (εmb) and bit-quota (̃bmb) are known, we can find a QPmb that makes itsB.C.Q. curve tangent slope be approximate to b̃mb/εmb. Inorder to adapt to current video properties, related proceduresof updating the dynamic energy transition table and B.C.Q.model are detailed in Section III and IV, respectively.

III. THE PROPOSED PERCEPTUAL FRAME-LEVEL

RATE-ADAPTATION IN H.264

In this Section, the proposed perceptual frame-level rate-adaptation for H.264 rate control is introduced. The goodmechanism of frame-level rate-adaptation can assign sufficientbudget bits for basic coding units, e.g., a frame, slice or mac-roblock, to reduce the visual distortion. In PQrc, the MADestimation and scene-change detection are main concerns forthe frame complexity evaluation. Besides, the buffer fullnessis also considered in PQrc to prevent buffer instability andenhance the video quality temporally.

A. MAD Estimation Using the Dynamic Energy TransitionTable

In PQrc, to address the aforementioned ”chicken and eggdilemma” problem and predict the precise frame complexity,the dynamic energy transition table is off-line establishedand on-line updated for estimating MAD iteratively. Firstly,numerous video clips are analyzed and MAD differences(δMAD,i−1) between adjacent frames are calculated by

δMAD,i−1 = MADa,i −MADa,i−1 (3)

where MADa is the actual MAD, MADa and δ are both realnumbers and rounded to the nearest ten, and i is the frameindex.

Through a great quantity of off-line statistics, the dy-namic energy transition table is established to record theδMAD count with respect to MADa, which is denoted asC(δMADa

,MADa). Note that δMAD is limited to the scopeof [-9.9, 9.9] and MAD is within [0.0, 9.9], respectively.

In the dynamic energy transition table, the expected valueof δMAD for the specific MADa (denoted as Eδ(MADa)) isused for inferring the next predicted MAD, which is given by

Eδ(MADa) =δ=9.9∑δ=−9.9

ProbδMAD· δ (4)

where ProbδMADmeans the probability of item

C(δMADa,MADa) in the dynamic energy transition

table.To perform the bit-rate control in cooperation with RDO,

MAD should be predicted in advance by

MADe,i = MADa,i−1 + Eδ(MADa,i−1) (5)


When the ith basic unit coding is finished, MADa,i andδMAD,i−1 can be obtained using Eq.(3), and then the countC(δMAD,i−1,MADa,i) is increased by one in the dynamicenergy transition table. It ensures that new table contents tobe suitable for properties of current video sequences.

B. Fine-tuned Frame Complexity Evaluation Using JND-PSNR based Scene Change Detection

For compensating the MAD prediction error in Eq.(5),the JND-PSNR based scene change detection is proposedin PQrc. In [2], authors claimed that if the ith skippedframe and previously coded frames belong to the same scene,the value (PSNRdrop,i·PSNRdrop−ratio,i) should typicallyhave a relative small value, in which PSNRdrop,i andPSNRdrop−ratio,i are respectively defined as

PSNRdrop,i = PSNRi−1 − PSNRskip,i (6)

PSNRdrop−ratio,i =PSNRdrop,i

1i

∑il=1 PSNRdrop,l

(7)

In Eq.(6), to fairly evaluate the whole distortion, the ith

skipped frame is replaced with previous decoded one whilecalculating the distortion between original and reconstructedvideo sequences. In our PQrc scheme, the PSNR is replacedwith the JND-based PSNR to only consider the just-noticeable-difference visual area. The JND-based PSNR (PSNRJND)calculation is given by [4],

PSNRJND = 10 log10 ·2552

MSEJND(8)

where MSEJND is defined as

MSEJND =

∑Nx=1

∑My=1[|f(x, y) − f̂(x, y)| − JNDTH ]2 · η(x, y)

N ·M(9)

and the tuning factor η is defined as

η =

{1, if |f(x, y) − f̂(x, y)| > JNDTH

0, if |f(x, y) − f̂(x, y)| ≤ JNDTH

(10)

where the frame size is N multiplied by M, JNDTH is theempirical threshold. If the difference between f and f̂ is lessthan the threshold (JNDTH ), η is set zero to discard theunapparent difference.

Jiang, et al. considered that PSNRdrop−ratio,i value inEq.(7) can be regarded as the scene change degree [2]. InPQrc, the PSNRJND value can be substituted for PSNRin Eq.(7) in consideration with human visual properties. Thepiecewise frame complexity (ε) can be fine-tuned by

ε =

MADe · 1.0, if S.C.deg ≤ 50MADe · 1.1, if 50 < S.C.deg ≤ 100MADe · 1.2, if 100 < S.C.deg ≤ 150MADe · 1.2, if 150 < S.C.deg ≤ 200MADe · 1.3, if S.C.deg ≥ 200

(11)

where S.C.deg equals (PSNRJND drop,i·PSNRJND drop−ratio,i).

Eq. (11) shows that if the scene-change is detected, PQrcwill increase the frame complexity (ε) slightly for adding morebit-quota. Since the demanded frame bit-rate (̃b) is proportionalto ε generally, we can model the relationship of ε and b̃ by

b̃ = a1 · ε2 + a2 · ε+ a3 (12)

where a1, a2 and a3 are empirical model parameters.Considering the buffer fullness (ω), if ω is larger than a

certain threshold (ν·TH), the rate control can decrease budget-bits in Eq.(12) (denoted as b̃

′) by minus overflowed bits (∆);

otherwise, the controller can increase b̃′

by plus the unusedbuffer capacity (∆) to avoid buffer overflow/underflow andenhance the video quality, i.e.,

b̃ = b̃′ − ∆,∆ =

{ ωf if ω > ν · THω − ν · TH otherwise.

(13)

In general, human eyes can’t perceive the granular dif-ference due to spatial/temporal masking effects. Thus, ourproposed JND-based perceptual bit-allocation can filter out in-significant signals and preserve more budget bits for improvingtemporal resolution.

IV. THE PROPOSED QUANTIZER DECISION USING THE

3-D QUANTIZATION MODEL

In this Section, we present the proposed 3-D bits-complexity-quantizer (denoted B.C.Q.) model for the fast MBquantizer decision, which is shown in Fig. 1. The proposedB.C.Q. model records each encoded bit-rates with respect tothe complexity using the selected quantizer in numerous off-line video analyses. Fig. 1 shows that (1) the demanded bit-rate is proportional to the MB complexity roughly, (2) eachΓc(b, ε) curve with respect to QP has an individual tangentslope (sQp), and these Γc consist of a curved surface, and(3) the smaller QP owns the larger sQp. Thus, these curve’stangent slope can be used for the quantizer decision when theMB budget bits (̃bmb) and complexity (εmb) are known. Theprocedures of the quantizer decision are detailed as follows.

010

201020

3040

50

0

50

100

150

complexityquantizer

bit−

rate

s

Fig. 1. The illustration of the proposed 3-D bits-complexity-quantizer(B.C.Q.) model.


A. MB Quantizer Decision Using the 3-D B.C.Q. model

The proposed 3-D B.C.Q. model is used for the MBquantizer decision. Firstly, we utilize the quadratic functionto approximate to these curves in Fig. 1 for quantizers[1. . .QPmax], which are modeled as

B = Γc · [ε2mb εmb 1]T (14)

where B is the matrix [̃bmb,QP=1 . . . b̃mb,QP=QPmax]T , Γc

is the matrix (n=QPmax, α, β and γ are parameters of eachB.C.Q. curve function.)

α11 β12 γ13

α21 β22 γ23

......

...αn1 βn2 γn3

Based on the refined B.C.Q. model, we can compute the

predicted MB bit-quota (̃bmb i,k) and the complexity (εmb i,k)for the kth macroblock in the ith frame to determine its properquantizer. The current MB’s complexity and the MBi−1,k’sactual encoded bits are considered for the MB-level bit-allocation if these two macroblocks belong to same codingtype. According to the aforementioned statements, the linearMB bit-allocation assignment can be formulated by

b̃mb i,k =

((b̃i −

k−1∑l=0

bmb i,l

)· εmb i,k∑Nmb

k εmb i,k

)·λ+bi−1,k·(1−λ)

(15)where b̃i is total budget bits for the ith frame, λ is a weightedfactor. If MBi,k and MBi−1,k are the same coding type, set λ0.5 empirically; otherwise, λ equals 0.8.

Then, in order to find the proper MB quantizer, the currentMB property (tangent slope) is characterized by

s̃i,k =b̃mb i,k − H̃mb i,k

εmb i,k(16)

where s̃i,k is the expected tangent slope of the current MB,H̃i,k is the estimated header bit of the current macroblock.Note that H̃i,k is only required for encoding video informationand motion vector.

Hence, the proper MB quantizer is selected to minimize theabsolute difference of s̃i,k and sQp, i.e.,

QPmb i,k = argQp=1,...,QPmax

min−1 {|s̃i,k − sQp|} (17)

While budget bits are exhausted, we reset quantizer toQPmax for preventing buffer overflow and reduce the numberof skipped frames.

B. Update B.C.Q. Model Using the Weighted Least SquareEstimation

For ensuring the B.C.Q. model to be suitable for newlygenerated video clips, PQrc updates the model using theweighted least square estimation based on coded data sets(bmb, εmb, QP ). When we have m data sets for the specificQP , the B.C.Q. curve function can be initialized by

b = E · Z + K (18)

where b is the matrix [b1 . . . bm]T for encoded bit-rates,K is the matrix [κ1 . . . κm]T for the prediction error, Z is[αQp βQp γQp]T , E is the matrix

ε2Qp 1 εQp 1 1Qp 1

ε2Qp 2 εQp 2 1Qp 2

......

...ε2Qp m εQp m 1Qp m

.

According to the least square estimation, we define aprediction error indicator ψ that equals KTK, i.e.,

ψ = KTK = (b − E · Z)T (b − E · Z) (19)

Eq.(19) can be expanded to (bTb − bTEZ − ZTETb +ZTETEZ). After setting ∂ψ

∂Z to zero, we can derive the bestsolution of Z which equals (ETE)−1ETb. For compromisingthe performance of video quality and computational cost, mis set to 10 by default. To remove the effect of outlier datasets, we weight each data set by a matrix � that is given by

τ1 0 . . . 00 τ2 . . . 0...

. . . 00 0 . . . τm

.where τ is inversely proportional to the bias that is apart fromthe regular B.C.Q. curve.

Thus, the error indicator can be re-written as KT�Kin Eq.(19), and the modified best solution of Z equals(ET�E)−1ET�b.

V. EXPERIMENT RESULTS

In our experiments, the PQrc performance is comparedwith that of latest JM10.2 model, including the visual qual-ity, channel bandwidth utilization, flickering degree, etc. Theadopted test videos with QCIF resolution (176x144) coverseveral kinds of movie scenarios, such as ”Akiyo” for the silentmotion, ”Foreman” for the intense motion, and ”Suzie” for thestandard video phony, etc.

According to experiment results, the proposed PQrc cangain PSNR 0.4 dB in average than the JM10.2 model dueto the perceptual frame-level bit-allocation. Fig. 2 depictsimage quality comparisons in the Foreman sequence, in whichthe bottom images encoded by our proposed PQrc can beobserved the apparent SNR improvement than the JM10.2model. As shown in Table I, since the current buffer status isalso considered for the frame-level bit-allocation in Eq.(13),the stable buffer can be achieved when queued bits are ap-proaching to the maximum buffer capacity. The proposed JND-PSNR based bit-allocation in Eq.(8) can reduce the flickeringdegree in consideration of human visual properties. The lessPSNR standard deviation indicates less flickering and consis-tent qualities. Additionally, the proposed 3-D B.C.Q. modelonly requires fewer operations of division and comparisonto determine each MB quantizer, so PQrc can increase the


encoding frame rate than that in the H.264 JM model. Underthe same coding condition, the proposed PQrc can encode0.462 frames per second (fps) over 0.28 fps encoded byJM10.2. Fig. 3 shows that the better PSNR gained by theproposed PQrc in the whole Foreman sequence.

Fig. 2. Comparison of the image quality based on the Foreman sequence.

Fig. 3. PSNR histogram of the Foreman sequence decoded by the JM10.2coder and the proposed PQrc.

VI. CONCLUSION

This paper proposed the H.264 bit-rate control frameworkusing the 3-D quantization model to enhance visual qualitiesover JM10.2. The major contributions of PQrc are twofold:(1) the perceptual frame-level bit-allocation and (2) the fastMB quantizer decision. At the beginning of the bit-rate control,the proposed PQrc can predict MAD using the dynamicenergy transition table and assign the frame-level budget bitsproperly in consideration of the human visual system. Besides,the 3-D B.C.Q. model was established to be able to support afast and precise quantizer depending on the current MB prop-erties measured by its tangent slope. Through unceasing modelupdates, the proposed MAD estimation and B.C.Q. models can

TABLE I

TABULATION OF THE ACHIEVED BITS, PSNR AND ITS STANDARD

DEVIATION (DEV.) AT 128KBPS.

Videos Rate Achieved bits PSNR PSNR dev.control (Kbps) (dB)

Akiyo JM10.2 128.36 48.021 0.858PQrc 128.36 48.362 0.846

Foreman JM10.2 128.90 37.234 1.451PQrc 128.87 37.521 1.191

Suzie JM10.2 129.58 41.031 1.369PQrc 129.43 41.394 1.308

reach to a stable stage for newly incoming video clips. Theexperiment results show that PQrc has better performancesover the H.264 JM model for both buffer stability and visualqualities concerns.

ACKNOWLEDGEMENT

This research is partially supported by the National ScienceCouncil of the Republic of China, Taiwan under the contractnumber NSC 95-2219-E-006-009, Computer CommunicationLaboratory (CCL), Industrial Technology Research Institute(ITRI), Taiwan, Republic of China, and Intel MicroelectronicsAsia Ltd., Taiwan Branch.

REFERENCES

[1] J.H. Park, J.K. Han and B.C. Song, ”An adaptive quantization usingmodified QP in H.264”, Proceedings of IEEE International Conferenceon Consumer Electronics, pp.229-230, January 2005.

[2] M. Jiang and N. Ling, ”An improved frame and macroblock layerbit allocation scheme for H.264 rate control”, Proceedings of IEEEInternational Conference on Circuits and Systems, pp.1501-1504, May2005.

[3] N. Kamaci, Y. Altunbasak and R.M. Merereau, ”Frame bit allocationfor the H.264/AVC video coder via Cauchy-density-based rate anddistortion models”, IEEE Transactions on Circuits and Systems ForVideo Technology, VOL.15, NO.8, pp.994-1006, August 2005.

[4] P. Lambert, W.D. Neve, P.D. Neve, I.Moerman, ”Rate-distortion Per-formance of H.264/AVC Compared to State-of-the-art Video Codecs”,IEEE Transactions on Circuits and Systems For Video Technology,VOL.16, NO.1, pp.134-140, January 2006.

[5] S. Miyaji, Y. Takishima and Y. Hatori, ”A Novel Rate Control Methodfor H.264 Video Coding”, Proceedings of IEEE International Confer-ence on Image Processing, VOL.2, pp.309-312, September 2005.

[6] T. Wiegand, G.J. Sullivan, G. Bjontegaard and A. Luthra, ”Overview ofthe H.264/AVC video coding standard”, IEEE Transactions on Circuitsand Systems For Video Technology, VOL.13, NO.7, pp.560-576, July2003.

[7] Z. Li, W. Gao and F. Pan, ”Adaptive Rate Control with HRD consider-ation”, JVT-H017, 8th Meeting: Geneva, May 2003.

[8] Z. He, Y.K. Kim and S.K. Mitra, ”Low-delay rate control for DCT videocoding via ρ-domain source modeling”, IEEE Transactions on Circuitsand Systems For Video Technology, VOL.11, NO.8, pp.928-940, August2001.

[9] ITU-T SG16 Video Coding Experts Group, ”Video Codec Test Model,Near-Term, Version 8 (TMN8)”, September, 1997.

[10] ISO/IEC JTC1/SC29/WG11, ”MPEG-4 Video Verification Model v18.0,Coding of Moving Pictures and Audio N3908”, January 2001.

[11] ISO/IEC JTC1, Information Technology- Coding of Audio-VisualObjects- Part 10: Advanced Video Coding, ISO/IEC FDIS 14496-10,2003.

[12] JVT/AVC reference software, ”http://iphome.hhi.de/suehring/tml/download/”.[13] Available from ”http://www.pixeltools.com/rate control paper.html”.


[IEEE IEEE Globecom 2006 - San Francisco, CA, USA (2006.11.27-2006.12.1)] IEEE Globecom 2006 -...

Documents

Transcript of [IEEE IEEE Globecom 2006 - San Francisco, CA, USA (2006.11.27-2006.12.1)] IEEE Globecom 2006 -...