Assessment of the compression efficiency of the MPEG-4 AVC specification

download Assessment of the compression efficiency of the MPEG-4 AVC specification

of 12

Transcript of Assessment of the compression efficiency of the MPEG-4 AVC specification

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    1/12

    A performance evaluation of MPEG-21 BSDLin the context of H.264/AVC

    Wesley De Neve+, Sam Lerouge+, Peter Lambert+, and Rik Van de Walle*

    +Ghent University, Sint-Pietersnieuwstraat 41 B-9000, Ghent, Belgium;*Ghent University - IMEC, Sint-Pietersnieuwstraat 41 B-9000, Ghent, Belgium

    ABSTRACT

    H.264/AVC is a new specification for digital video coding that aims at a deployment in a lot of multimediaapplications, such as video conferencing, digital television broadcasting, and internet streaming. This is forinstance reflected by the design goals of the standard, which are about the provision of an efficient compressionscheme and a network-friendly representation of the compressed data. Those requirements have resulted in a veryflexible syntax and architecture that is fundamentally different from previous standards for video compression.

    In this paper, a detailed discussion will be provided on how to apply an extended version of the MPEG-

    21 Bitstream Syntax Description Language (MPEG-21 BSDL) to the Annex B syntax of the H.264/AVCspecification. This XML based language will facilitate the high-level manipulation of an H.264/AVC bitstream inorder to take into account the constraints and requirements of a particular usage environment. Our performancemeasurements and optimizations show that it is possible to make use of MPEG-21 BSDL in the context of thecurrent H.264/AVC standard with a feasible computational complexity when exploiting temporal scalability.

    Keywords: AVC, BSDL, Content Adaptation, Content Description, H.264, MPEG, Scalability

    1. INTRODUCTION

    H.264/AVC is a new specification for digital video coding,1 characterized by a design that targets efficiency,robustness, and usability.2 Because of its support for a wide range of bit rates,3 H.264/AVC can even beconsidered as a universal standard for digital video coding. The latter implies that the specification in question

    will be used under the hood of a lot of multimedia applications in the very near future. Those video-enabledapplications will most probably be deployed on a wide variety of terminals, exchanging information with eachother by making use of several types of networks. This is not a very attractive situation for content providersbecause they see themselves as being obliged to provide several versions of the same multimedia presentationin order to reach a target audience that is as large as possible. It would be much more efficient if they only hadto provide one presentation that could be reused under all circumstances. A solution for this diversity is theusage of scalable video coding, together with a complementary content adaptation system.

    In the current H.264/AVC specification, there are no explicit provisions for enabling scalability althoughsome efforts are emerging.4 The latter is currently a hot topic in the video coding and content adaptationcommunity5 because of the fact that scalable coding should make it possible to deal with the growing variety ofnetworks and terminals in an efficient way. To be more specific, think for example about the scenario of a userwho has a large collection of music video clips at his or her disposal. One may assume that all video streams are

    encoded at a very high quality, for instance by making use of an efficient implementation of the Main Profile asavailable in the H.264/AVC specification. As such the media files in question are suited for playback on a digitalhome entertainment system. But what if the user wants to enjoy the same video clips on a mobile device whentraveling to work by train? Then the need arises for a content adapation system that should make it possible to

    Further author information: (Send correspondence to Wesley De Neve)Wesley De Neve: E-mail: [email protected], Telephone: +32 (0)9 264 89 29Sam Lerouge: E-mail: [email protected], Telephone: +32 (0)9 264 89 17Peter Lambert: E-mail: [email protected], Telephone: +32 (0)9 264 89 29Rik Van de Walle: E-mail: [email protected], Telephone: +32 (0)9 264 33 68

    Applications of Digital Image Processing XXVII,

    edited by Andrew G. Tescher, Proceedings of SPIE Vol. 5558(SPIE, Bellingham, WA, 2004) 0277-786X/04/$15 doi: 10.1117/12.564822

    555

    Downloaded from SPIE Digital Library on 07 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    2/12

    realize an efficient transfer of the video clips from the full-featured PC to a mobile device, taking into accountthe constraints of the new usage environment (such as a limited battery life, a reduced screen resolution, . . . ).

    Although H.264/AVC is a specification for single-layered video compression, we will show how an extendedversion of MPEG-21 BSDL can be used in combination with the Annex B syntax in order to make possiblesome high-level manipulations of an H.264/AVC bitstream. In particular, we will discuss some results with

    regards to the performance when using BSDL to exploit a trivial form of temporal scalability in H.264/AVC.The outline of the paper is as follows: after having given an in-depth overview of the involved technologies

    in section 2, a description of the applied methodology for performing the measurements is provided in section 3.Section 4 discusses the obtained results and section 5 concludes.

    2. MPEG-21 BSDL IN THE CONTEXT OF H.264/AVC

    This section describes the different technologies that were involved in our research. First of all, an overviewis given of the design characteristics and syntaxes of the H.264/AVC specification. Second, a discussion isprovided of the MPEG-21 Bitstream Syntax Description Language (MPEG-21 BSDL), together with a detaileddescription of the extensions that were needed in order to describe a large part of the Annex B syntax.

    2.1. Overview of the Design Characteristics and Syntaxes of H.264/AVC

    In order to cope with the diversity of the current and future network protocols, H.264/AVC can rely on itstwo-tier architecture. As illustrated by Figure 1(a), this architecture consists of a Video Coding Layer (VCL)and a Network Abstraction Layer (NAL). While the VCL is responsible for the efficient compression of thevideo data, the NAL is responsible for transforming the compressed video data into a generic stream of logicaldata units. The latter are called Network Abstraction Layer Units (NALUs) and those syntax structures havethe property that their mapping to a transport protocol (RTP, MPEG-2 Systems, ...) or storage format (theISO Media File Format, ...) can be considered straightforward.

    In fact, the NALUs are the fundamental units of processing in the context of H.264/AVC. As illustrated bythe NALU layer in Figure 1(b), the units in question consist of a one byte header and a payload. The structure ofthe payload is determined by the value of the nal unit type syntax element, as available in the NALU header.As such, the NALUs are responsible for the delivery of several types of data. For instance, the parameter setNALUs carry parameters necessary for the decoding process while the coded slice NALUs do contain the actualcompressed video data. The allowed NALU type codes are provided by Table 1(c) (a similar table can be foundin the standards document) while Figure 1(d) provides some more detail about the dependencies between theseveral types of NALUs and the location of some of the most relevant decoding parameters.

    The bitstream containing the coded representation of the header information and the video data can bedescribed by making use of two syntaxes: the byte stream format or the NAL unit stream format. The bytestream syntax is characterized by the fact that the NALUs are separated from each other by making use of zeroor more zero-valued bytes and a start code prefix (see Figure 1(b)). This syntax is also known as the Annex Bsyntax. Otherwise, without the presence of zero-valued bytes and start code prefixes, one is dealing with theNALU syntax. The latter is for instance useful in systems that provide their own kind of framing. The bytestream format can be constructed from the NAL unit stream format by ordering the NAL units in decodingorder and prefixing each NAL unit with zero or more zero-valued bytes and a start code prefix.

    2.2. Overview of the MPEG-21 Bitstream Syntax Description Language

    MPEG-21 BSDL is an XML based language for the description of the syntax of (scalable) bitstreams. Inorder to avoid a large overhead and unnecessary computations, the language in question will most often onlybe used for the description of the high-level structure of a bitstream. 6 BSDL was developed in the context ofthe MPEG-21 Multimedia Framework which aims to enable the transparent and augmented use of multimedia

    Sometimes other languages are used for the description of the syntax of media related bitstreams. For instance, theMPEG-4 Systems standard makes use of the Syntactic Description Language (SDL). This language allows to documentthe syntax of object-oriented structures in a C++ kind of way.

    556 Proc. of SPIE Vol. 5558

    Downloaded from SPIE Digital Library on 07 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    3/12

    Video Coding Layer

    ControlData

    (EncodingParameters)

    Network Abstraction Layer generic stream of NAL Units

    RTP/IP ISO Media File Format MPEG-2 Systems

    ... ...

    Coded Slice/Partition

    Annex B syntax

    (with start codes)

    NALU syntax

    (without start codes)

    Content adaptation involving MPEG-21 BSDL

    Systems Layer (synchronization, )

    NALU NALU NALU NALU NALU

    NALUsyntax

    (without start codes)

    Datapartitioning

    Coded Macroblock

    (a) An H.264/AVC encoder in the context of BSDL.

    NA L h ea de r ( si ze of 1 byte) raw by te se qu en ce pa ylo ad (RBS P)

    zero_byte start_code_ pref ix _one_3byt es NAL Unit (NALU)

    0x00 0x000001

    slice header slice data

    slice layer

    NALU layer

    macroblock layer

    forbidden_zero_bit nal_ref_idc nal_unit_type

    NALU header

    NAL layer

    st rin g of da ta bits (SODB) stu ff ing

    MB MB MB MB MB MB MB MB MB MB

    (b) Structure of a NAL unit carrying slice data.

    Unspecified (24-29 allocated for RTP)24..31

    Reserved13..23

    Filler data12

    End of stream11

    End of sequence10

    Acces unit delimiter9

    Picture Parameter Set (PPS)8

    Sequence Parameter Set (SPS)7

    Supplemental Enhancement Information (SEI)6

    Coded slice of an IDR picture5

    Coded slice data partition C4

    Coded slice data partition B3

    Coded slice data partition A2

    Coded slice of a non-IDR picture1

    Unspecified0

    NALU content (RBSP structure)nal_unit_type

    (c) NALU type codes.

    NAL header (active) sequence parameter set (SPS)

    NAL header (active) picture parameter set (PPS)

    NAL header slice header slice data

    NAL header slice header slice data

    Parameters valid for an entire sequence- profile@level information- resolution

    - number of reference pictures

    Parameters valid for at least one picture- type of entropy coding- number of slice groups (FMO)- initial values for quantisation parameter- parameters for deblocking filter

    Frequently varying parameters- slice type- address of first macroblock in slice

    NALU type available in NAL header

    (d) NALU dependencies and relevant syntax elements.

    Figure 1. Schematic overview of the design characteristics and syntaxes of H.264/AVC.

    resources across a wide range of networks and devices.7 It is actually embedded in part 7 of MPEG-21, thelatter better known as MPEG-21 Digital Item Adaptation (MPEG-21 DIA).8

    The motivation behind the development of MPEG-21 BSDL is the fact that having a scalable format aloneis not sufficient. One also needs a program for the analysis and the actual adaptation of (scalable) bitstreams.Because every coding format has its own structure, one would expect at first sight that a separate programis required for every specific coding format. However, a more generic solution can be devised. To be more

    specific, it is possible to create a universal program for the analysis and adaptation of (scalable) bitstreams byrelying on a common language for the description of the syntax of a specific coding format. Such a languagewas developed in the context of MPEG-21 and is known as BSDL. The language in question is in fact based onsome extensions to W3C XML Schema on the one hand (bitwise datatypes, . . . ) and on some restrictions toW3C XML Schema on the other hand (the occurance of attributes is for instance prohibited in the resultingXML description of the structure of a bitstream because attributes are allowed to occur in an arbitrary order byXML Schema, the latter naturally not being the case for syntax elements, . . . ). Making use of XML and XML

    XML Schema is a recommendation of the World Wide Web Consortium (W3C), making it possible to specify somerules with respect to the structure of an XML document, the nomenclature of XML elements and attributes, . . . 911

    Proc. of SPIE Vol. 5558 557

    Downloaded from SPIE Digital Library on 07 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    4/12

    Schema has several advantages: one can reuse a lot of already existing tools for doing XML related operationsand it also allows a more straightforward integration with other XML based standards in the long term.

    To focus ones mind, Figure 2(a) provides a simplified example that illustrates how MPEG-21 BSDL canbe combined with H.264/AVC. An excerpt of the developed scheme in BSDL for the Annex B syntax of theH.264/AVC specification is available in Annex A. On the right side of Figure 2(a), one can notice a videostream, i.e. a sequence of slices. On the left side of the picture an XML based BSD is provided, describingthe high level structure of the H.264/AVC bitstream. This XML description contains several elements. Asillustrated by the arrows, most of the elements are linked to a corresponding slice and contain some informationabout the slice in question, such as the type of the slice and the position of the first and last byte of the slicein question in the compressed stream.

    In a next step, it is possible to apply some changes to this XML description. For instance, one can decideto drop the XML elements that are linked to the B slices. The interesting thing about this is that one canprovide this altered XML description to a content adaptation engine that is smart enough to recognize thechanges that were done in the XML domain (which is a more abstract or high-level approach for doing contentmanipulation). As such, the content adaptation engine can apply those changes in the compressed domain,resulting in a bitstream without B slices. This temporally downsampled bitstream is, for instance, now moreappropriate for playback on a mobile device. Since the H.264/AVC specification is time unaware, one may alsoassume that the synchronisation of the remaining H.264/AVC samples can be taken into account by relyingon a file format or a network protocol, as illustrated by the Systems layer in Figure 1(a). With respect to thecontent adaptation engine, this piece of logic is available in the MPEG-21 reference software package.

    The process as discussed before is summarized in a formal way in Figure 2(b). In the first step, one startsfrom a bitstream typically encoded at a high quality such that it is useful to derive other versions of thisparticular bitstream. This parent bitstream is given as input to the BinToBSD tool, being part of the MPEG-21 reference software, together with a description of the Annex B Syntax at a certain granularity (for instancea description up to the level of the NALU header or up to the level of the slide header()). The latter syntaxdescription is written down by making use of BSDL. The BinToBSD tool is now capable of generating an XMLdescription of the structure of the H.264/AVC bitstream in question.

    In a next step, one can apply a set of filters to the XML based bitstream syntax description (BSD) ofthe H.264/AVC bitstream. For instance, in a first stage one can apply a filter in order to simplify the XML

    description in question or in order to add some metadata such that smarter adaptations are possible.12

    Forexample, based on MPEG-7 metadata, one can highlight that part of the video stream that is dealing with asports scene. After this preprocessing step, one can apply zero or more filters in order to realize the actualmanipulation of the XML description, such as dropping the XML elements describing the B slices or, forinstance, selecting the scenes that contain sports content. Which filter to apply can be made dependent on anegotiation process making use of multi-criteria optimization.13 This will finally result in an appropriate XMLdescription. The filters can be implemented by relying on several technologies, such as Extensible StylesheetLanguage (XSL) documents, an XML API, . . .

    In a final step, the adapted BSD can be provided to the BSDToBin tool. Together with the original bitstreamand the document describing (a part of) the H.264/AVC syntax, this will result in an adapted bitstream thatis suited for a particular usage environment.

    Note that the BSD, as generated by the BinToBSD tool, only has to be created once in a production

    environment. This observation also applies to the preprocessing step. When the bitstream syntax descriptionis available at a sufficient detailed level, it should also be possible to derive several versions of the originalH.264/AVC bitstream in order to meet the requirements of a particular usage environment. It is also importantto know that MPEG-21 BSDL often allows doing data manipulations without requiring a recode of the mediadata in question, although it is possible that some side effects have to be solved. The latter will be discussedin a next section. It should also b e clear that MPEG-21 BSDL allows realizing manipulations of multimediacontent at a more abstract level once a BSD of a particular bitstream is available, thus making it possible toenter the semantic domain (i.e. not having to deal any longer with the pure bits and bytes).

    We assume that every remaining slice can be reconstructed without having to rely on a B slice.

    558 Proc. of SPIE Vol. 5558

    Downloaded from SPIE Digital Library on 07 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    5/12

    I slice

    B slice

    B slice

    P slice

    B slice

    B slice

    0-24

    25-2637

    2638-27462747-2903

    2903-3857

    3857-3972

    3973-4103

    t

    (a) An XML description of an H.264/AVC bitstream.

    Adapted XML Description

    [myPrecious_10hz.xml]6

    Original Bitstream[myPrecious_30hz.264]

    1

    XSLT StylesheetXSLT StylesheetXSLT Stylesheet[drop_BSlices.xsl]

    Filters

    4

    XML Description[myPrecious_30hz.xml]

    BinToBSD

    +

    2

    Scaled Bitstream

    [myPrecious_10hz.264]

    BSDToBin

    +

    h264_avc.bsd

    7

    Pre-processing

    Post-processing

    3

    5

    Adapted XML Description

    [myPrecious_10hz.xml]6

    Original Bitstream[myPrecious_30hz.264]

    Universal Adaptation Engine = BinToBSD + XSL processor + BSDToBin

    1

    XSLT StylesheetXSLT StylesheetXSLT Stylesheet[drop_BSlices.xsl]

    Filters

    4

    XML Description[myPrecious_30hz.xml]

    BinToBSD

    +h264_avc.bsd

    2

    Scaled Bitstream

    [myPrecious_10hz.264]

    BSDToBin

    +

    7

    Pre-processing

    Post-processing

    3

    5

    (b) Content adaptation by making use of BSDL.

    Figure 2. Schematic overview of MPEG-21 BSDL in the context of H.264/AVC.

    2.3. Combining H.264/AVC and MPEG-21 BSDL: Implementation Aspects

    So far, a Bitstream Syntax Description (BSD) scheme for the Byte Stream NALU syntax was implemented.This scheme makes it possible to describe every syntax element up to the level of the slide header() structure.For a lot of applications it will not be necessary to have access to all those syntax elements in order to realizethe desired functionality. For instance, in order to exploit temporal scalability by dropping non-reference Bslices, one only needs access to the nal ref idc and the slice type parameter, respectively available in theNALU header and slide header() as illustrated by Figure 1(d). In order to exploit quality scalability realizedby making use of the data partitioning feature of the H.264/AVC specification, one only needs access to thenal unit type parameter. For the actual implementation of the BSD scheme for the Annex B syntax, we had

    to rely on some extensions to the current BSDL specification. These non-normative extensions have alreadybeen touched in a previous paper from a more high-level point of view.14 In the next paragraphs, the extensionsin question will be covered in some more detail.

    First of all, we had to make use of the fillByte datatype. This construction makes it possible to force theBinToBSD parser to search for the next byte aligned position. As such, fillByte maps to the syntax functionByte Aligned(), which can be found in the H.264/AVC specification (although the semantics are not entirelythe same). Moreover, the type fillByte can be used for debugging purposes but also for limiting the overheadof the amount of XML data produced when describing the structure of a bitstream. This is important because ofthe fact that the function that skips all data up to the next start code prefix, can only be called on a byte alignedposition. When dealing with bitstreams that are synthesized by making use of syntax elements encoded by avariable length code (VLC), the fillByte type also proves to be very useful. In case the fillByte datatypeis not available, one is often forced to parse the bitstream to a position that is known for its byte alignment,

    despite the fact that one is not interested in all the information that is being parsed. It is also important torealize that the current informative implementation of the fillByte datatype, as provided by the developersof the reference software, has a lossy character. This can result in some unexpected side effects when doingediting operations on syntax elements represented by a VLC. Such a scenario will be illustrated by an examplein one of the following paragraphs.

    Second, we also had to make use of the implementation construction. This extension makes it possible torely on procedural objects in order to perform complex computations or in order to deal with complex datatypes.Complex means that is not trivial or just impossible to do the computations by making use of BSDL, or that

    Proc. of SPIE Vol. 5558 559

    Downloaded from SPIE Digital Library on 07 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    6/12

    slice 1

    slice 2

    slice 3

    0

    33

    66

    slice 1

    slice 2

    slice 3

    33

    0

    66

    slice 2

    slice 1

    slice 3

    0

    33

    66

    Arbitrary Slice Order (ASO)

    (a) Manipulation of the first mb in slice parameter.

    1?

    na

    1voor

    schema (decimaal)bitstroom (binair)

    1 bbbbbb00 00010001 0???????

    bbbbbb00 00010001 00000001after

    1... bbbbbb11before

    schema (decimal)bitstream (binary)

    (b) Byte alignment problem.

    Figure 3. Generation of a corrupt bitstream due to the usage of the fillByte construction. Note that 1 is the binaryexponential golomb representation for the decimal zero, and that 00000100001 is the binary representation of 33.

    it is not possible to describe a datatype by making use the just mentioned language in an efficient way. To be

    more specific, theimplementation

    attribute allows to call Java classes from the BSD scheme written in BSDL.The implementation construction was used, among others, for parsing the syntax elements that are encoded

    by the signed or unsigned exponential golomb entropy coding scheme (i.e. this is one of the cases in which onehas to deal with a complex datatype). It is possible to describe those entropy coding schemes in BSDL, butthis will result in a tremendous overhead: every single bit of an exponential golomb coded syntax element hasto be put in one XML element. On top of that, it is not straightforward to interpret or decode the resultingXML description of the syntax element in question by making use of XPath. The latter technology allows toperform queries against an XML document for retrieving the value of a particular XML element, . . . 15 Thisfunctionality is required when one wants to apply changes to a BSD. For instance, the decoding of elementsthat are encoded by the entropy coding schemes in question is necessary for realizing temporal scalability sincethe slice type parameter is represented by an exponential golomb codeword.

    The implementation construction is also used for the parsing of the slice group change cycle syntax

    element. This parameter occurs as the last syntax element in the slice header() syntax structure whenFlexible Macroblock Ordering (FMO) types 2, 3, or 4 are used, the latter supporting evolving slice groups.As such, the parameter in question determines the number of macroblocks in slicegroup 0. The main reasonfor using the implementation construction lies in the fact that the number of bits for the representation of theslice group change cycle syntax element has to be computed by evaluating the logarithmic function withbase two, a common operation when parsing bitstreams. However, the latter is not available in the XPathspecification (i.e. this is one of the cases in which complex computations are necessary). When the set of inputvalues is limited, this limitation can be circumvented by making use of the union element and precalculatedvalues (thus no longer requiring the evaluation of the logarithmic function). However, the latter is not the casefor the syntax element in question because the number of input values is dependent on the resolution. Gettingaccess to the value of this parameter allowed us to drop the background of a video sequence encoded with FMOtypes 2, 3, and 4. The procedure for the elimination of the background itself was implemented by makinguse of a cascade of two XSL stylesheets due to the complexity of the XPath expressions. This complexityis a consequence of the pointer based relationship between a slice header() and the picture and sequenceparameter sets, and the fact that the latter can occur more than once in an H.264/AVC compliant bitstream.The encoding and decoding were done by making use of a modified version of the reference encoder and decoder.

    Finally, the implementation approach was also applied to the cabac alignment one bit syntax element,being part of the slice data() structure. The reason for this approach can be explained by an example inwhich the slices in an H.264/AVC bitstream are shuffled per picture. Although being a pure academic problem,it is a good illustration of the side effects that may occur when performing editing operations in the compressed

    The functionality of this element can be compared to a switch statement in a programming language.

    560 Proc. of SPIE Vol. 5558

    Downloaded from SPIE Digital Library on 07 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    7/12

    domain. The latter phenomenon can for instance occur when transcoding an H.264/AVC bitstream from theMain Profile to the Baseline profile. The shuffling of the slices is illustrated by Figure 3(a) in which a sequence ofpictures at QCIF resolution is encoded by dividing each picture into three slices of equal sizes (33 macroblocks).The shuffling consists of switching every first and second slice of a picture by manipulating the value of thefirst mb in slice parameter in the corresponding slice header() syntax structures. This manipulationwill be detected by the Arbitrary Slice Ordering (ASO) feature of a decoder, resulting in a distorted video

    sequence. Due to the fact that the first mb in slice syntax element is represented by an exponential golombcode, the change of zero to 33 and vice versa (33 is the number of the first macroblock in the second slice)will result in a change of the byte alignment of the slice header() structure. As illustrated by table 3(b)the fillByte construction does not deal with that change in a correct way, resulting in a corrupt bitstreamat the transition of the slice header() and the slice data() syntax structures since the question marksshould all have been replaced by ones. The fact that byte alignment has to be achieved at the transition ofthe slice header() and the slice data() syntax structures by adding an appropriate number of one bits isrequired by the H.264/AVC specification. For simplicity, all syntax elements between the first mb in sliceparameter and the cabac alignment one bit parameter are omitted.

    Although we could develop a BSD description up to the level of the slice header() syntax structure,we are currently not able to parse bitstreams in which NALU emulation prevention bytes occur at the levelof the syntax structure in question. The presence of emulation prevention bytes ensures that no sequence of

    consecutive byte-aligned bytes in the NAL unit contains a start code prefix. The reason for not b eing able todeal with those special bytes is the lack of an appropriate look ahead mechanism for the detection of the bytesin question in the current version of MPEG-21 BSDL. In theory, it would be possible to locate those bytes bymaking use of the ifNext construction in MPEG-21 BSDL because the latter allows looking ahead. However,such an approach would actually require an ifNext operation that can be executed on every 32 bits alignedposition. The latter is not achievable in practice (for instance, due to the usage of VLCs). Another challengeis the appropriate insertion of NALU emulation prevention bytes in manipulated bitstreams. For instance, ourimplemented procedural objects do not take into account the occurance of and the need for NALU emulationprevention bytes. Note that this problem does not emerge in the case of MPEG-4 Visual bitstreams due to atotally different organization of the header information such that the usage of emulation bytes is not necessary.

    3. APPLIED METHODOLOGY

    This section discusses the way the compressed bitstreams and their corresponding XML based syntax descrip-tions were generated. Some information is provided about the tools used for doing the profiling of the referencesoftware for MPEG-21 BSDL (i.e. the BinToBSD tool and the BSDToBin tools).

    3.1. The Encoding Process and BSD Generation

    The purpose of the performance measurements is to get some insight in the processing time required by theBinToBSD and BSDToBin tool on the one hand, and the XSL engine on the other hand when exploitingtemporal scalability in the current H.264/AVC specification. The latter is being realized by dropping non-reference B slices. Those results are for instance relevant in case one wants to know whether is possible to usethe tools in question in real time (for instance, in a streaming scenario). Some attention will also be paid tothe overhead as a consequence of the usage of procedural objects for the parsing of syntax elements having acomplex representation. It is also important to note that all MPEG-21 related tools are written in Java.

    For the creation of the H.264/AVC bitstreams, we have relied on the H.264/AVC reference software, versionJM 7.6.16 As input, the progressive Foreman test sequence was used in the planar YUV 4:2:0 pixel format,having a resolution of 176x144 and a length of 300 pictures. For the encoding of the test sequence in question,nine different lenghts were used. The lengths are 21, 49, 73, 99, 199, 299, 399, 499, and 599 pictures. For eachlength, the encoding was done at a bit rate of 1000 kbit/s and at a fixed frame rate of 30 Hz. Only one Ipicture was used, alternately followed by a P picture and a non-reference B picture. The encoding process asjust mentioned was done for one slice per picture, two slices per picture, and three slices per picture, resulting

    A reference picture is a picture with nal ref idc not equal to zero.

    Proc. of SPIE Vol. 5558 561

    Downloaded from SPIE Digital Library on 07 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    8/12

    in a set of 27 bitstreams. All slices per picture belong to the same type (satisfied due to the value of theslice type syntax element). Emulation prevention bytes did not occur in the syntax structures parsed by theBSDL reference software.

    The actual performance analysis was done for several schemes written in BSDL: a full scheme describing allsyntax elements up to the level of the slice header() datastructure, and a normalized scheme only describingthose parameters that are really necessary for exploiting temporal scalability. The latter implies parsing every-thing up to the level of the slice type parameter in the slice header() datastructure for the slices containingcoded picture data. The SPS and PPS are not analyzed in case of the simplified scheme. For the generation ofthe XML descriptions, the BSDL reference software was used, version 1.1.3. Timing was done by relying on thetimers as made available in the two BSDL tools, taking into account the overhead related to input and output.

    3.2. Performance Measurements

    The performance measurements for the tools of interest are done by making use of HPjmeter. 17 The latteris a program that helps to detect performance bottlenecks in Java based software by graphically displayingprofiling data. The tool in question was used on a PC having an Intel Pentium IV CPU, clocked at 2.61 GHz,and having 512 MB or RAM at its disposal. The operating system used was Windows XP Pro (service pack1), running Sun Microsystemss Java 2 Runtime Environment (Standard Edition). The profiling option chosenwas -Xrunhprof:cpu=times. The latter makes it possible to measure the time taken by the individual methodsand it also generates a sorted list ranked as a total percentage of the CPU time taken by the application.

    4. EXPERIMENTAL RESULTS

    This section covers some of the performance results that were obtained during our research. Figure 4(a) indicatesthat the processing time required by the BinToBSD tool is characterized by an exponential behavior in terms ofthe number of slices in case of the simplified BSD scheme (note that the Y-axis has a logarithmic scale and thatthe points on the X-axis are not equidistant). For instance, 145 seconds are needed in order to generate a BSDfor a bitstream containing 599 pictures, hereby making use of one slice per picture. One can also notice thatthe processing is done in terms of slices: a stream of 300 pictures without slices results in the same behaviourfor the BinToBSD tool as a stream of 100 pictures with three slices per picture. The exponential behavior ofthe BinToBSD tool is also emphasized when making use of the full scheme for generating a BSD, especially

    due to the evaluation of a lot of control statements necessary for guiding the parsing process, the latter oftenimplemented as complex XPath expressions.

    A first attempt to boost the performance consisted of making the simplified BSD scheme deterministic. Thetwo previous schemes, the full one and the simplified one, are generic in the sense that they can be appliedto any H.264/AVC compliant bitstream, regardless of the profile implemented or the GOP structure used.When taking into account the latter information, together with the fact that the SPS and the PPS are alwaysthe two first NALUs, one can create a BSD scheme that is much more simple because it is possible to drop alot of complex control statements then. However, this scheme still resulted in an exponential behavior of theBinToBSD tool as can be seen in Figure 4(b).

    An extensive profiling with the HPjmeter tool revealed that the performance problem of the BinToBSDprogram, making use of the simplified deterministic scheme, could still be traced back to the usage of XPathexpressions. To be more specific, the performance problem in question is related to the usage of XPath expres-

    sions when the nOccurs attribute is used. The latter BSDL attribute specifies how many times a particularsyntax element can occur in a bitstream by making use of an XPath expression. When this attribute does notoccur in the BSD scheme, the BinToBSD tool falls back to a default value of one (a constant XPath expression)for the attribute in question since most syntax elements only occur once on a particular position in a bitstream.However, when this attribute does occur in the BSD scheme, the BinToBSD tool duplicates the internal datas-tructure containing the XML description of the structure of the H.264/AVC bitstream, anticipating the possibleexecution of an XPath expression. Because the nOccurs attribute was used in the declaration of every possiblesyntax element for clarity purposes (even when the syntax element could only occur once), its presence resulted

    This genericity is also one of the major reasons for the complexity of the XPath expressions.

    562 Proc. of SPIE Vol. 5558

    Downloaded from SPIE Digital Library on 07 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    9/12

    1.0

    10.0

    100.0

    1000.0

    10000.0

    21 49 73 99 199 299 399 499 599

    #Pictures

    BinToBSD

    ProcessingT

    ime[s]

    1 s li ce /p ic ture 2 s li ces/p ic ture 3 s li ces/p ic ture

    //

    (a) Simplified scheme (DOM).

    1.0

    10.0

    100.0

    1000.0

    10000.0

    21 49 73 99 199 299 399 499 599

    #Pictures

    BinToBSD

    ProcessingTime[s]

    1 slice/picture 2 s lices/picture 3 s lices/picture

    //

    (b) Simplified deterministic scheme (DOM).

    0

    200

    400

    600

    800

    1000

    1200

    21 49 73 99 199 299 399 499 599

    #Pictures

    XSLProcessingTime[ms]

    1 slice/picture 2 s lices/picture 3 s lices/picture

    //

    (c) Simplified scheme (Xalan implementation).

    0.0

    1.0

    2.0

    3.0

    4.0

    5.0

    6.0

    21 49 73 99 199 299 399 499 599

    #Pictures

    BSDToBinProcessingTime[s]

    1 slice/picture 2 s lices/picture 3 s lices/picture

    //

    (d) Simplified scheme (DOM).

    Accumulated Exclusive Method Time (CPU) (percentage)config: 200 pictures - 1 slice/picture

    0

    10

    20

    30

    40

    50

    60

    apach

    e.xm

    l.dtm

    apach

    e.crim

    son

    apach

    e.xm

    l.utils

    java.lan

    g

    java.util.V

    ecto

    r

    java.io

    org.mpe

    g21

    othe

    r

    (e) Simplified deterministic scheme (DOM).

    Accumulated Exclusive Method Time (CPU) (percentage)config: 200 pictures - 1 slice/picture

    0

    10

    20

    30

    40

    50

    60

    java

    .lang

    java.io

    sun

    mpeg2

    1.

    XSD

    mpeg

    21.utils

    mpe

    g21.io

    othe

    r

    (f) Optimized simplified deterministic scheme (DOM).

    Figure 4. Overview of the experimental results.

    Proc. of SPIE Vol. 5558 563

    Downloaded from SPIE Digital Library on 07 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    10/12

    in a duplication of the XML datastructure for every syntax element. The behaviour in question is reflected bythe execution times needed by the functions that are responsible for the duplication of the XML structure. Ascan be deduced from the pie chart in Figure 4(e), a lot of processing time is spent in the Document Table Model(DTM) package (org.apache.xml.dtm). DTM is an interface designed specifically to optimize performanceand minimize storage when making use of the Apache XPath and XSLT implementations. 18 Note that theexclusive method time is the time spent in a method, not taking into account the time spent in the functions

    that were called by the method in question.

    Taking into account the latter knowledge, a much more efficient version of the simplified deterministic schemein BSDL could be created. This finally resulted in the generation of a BSD that is faster than real-time, becauseof the lack of the nOccurs attribute and the lack of XPath expressions in the scheme in question. This is alsoillustrated by the shift of the accumulated exclusive method time to other packages in Figure 4, especially tothe ones that are responsible for input and output operations. For example, about 4 seconds are needed in orderto generate a BSD for a bitstream containing 599 pictures, hereby making use of three slices per picture. Thelatter example took about 992 seconds in case of the simplified deterministic BSDL scheme, about 1096 secondsin case of the original simplified scheme, and about 68041 seconds in case of the full scheme. The averagespeed-up of the BinToBSD tool, using the optimized simplified deterministic scheme, is 90.95%, compared tothe execution time needed by the original simplified scheme (standard deviation: 13.35%). Note that the BSD,as the result of the usage of the optimized scheme, is still equivalent with the one that is being generated by the

    simplified and the very first deterministic scheme, thus still enabling the exploitation of temporal scalability.With respect to the processing time needed by the Xalan XSL engine, one can observe execution times that

    are quite fast: generating an XML document originally describing 599 pictures (one slice per picture) requires625 milliseconds. The resulting XML document only contains the descriptions of NALUs carrying a SPS, aPPS, or compressed data related to I and P slices (and no longer to B slices). The same observation appliesto the BSDToBin tool, regardless whether the Document Object Model (DOM) or the Simple API for XML(SAX) are used for the internal representation and processing of the XML description. The fast behavior ofthe BSDToBin tool can be explained by the fact that it is no longer necessary to evaluate XPath expressions.This leads to the observation of a potential asymmetrical behavior between the BSD encoder (BinToBSD) andBSD decoder (BSDToBin) when the encoder in question has to deal with a lot of XPath expressions, the latterbeing very similar to the behavior of MPEG-x and H.26x encoders and decoders. It is also interesting to seethat the fast behavior of the optimized simplified deterministic scheme proves that it is possible to make use of

    Java procedural objects for achieving byte alignment and for the decoding and encoding of exponential golombcoded syntax elements in an efficient way. Some of the quantitative results can be found in Annex B.

    5. CONCLUSIONS AND FUTURE WORK

    After having given an overview of the design characteristics and syntaxes of the H.264/AVC specification on theone hand, and of MPEG-21 BSDL on the other hand, a detailed discussion was provided about the extensionsneeded in order to combine BSDL with the Annex B syntax. Some of those extensions can be mapped to theelementary syntax functions as defined in the H.264/AVC specification. It would be useful if MPEG-21 BSDLcould incorporate a relevant subset of their funtionality. With respect to the performance measurements, wehave shown that is possible to make use of an extended version of MPEG-21 BSDL for the efficient exploitationof temporal scalability in the current H.264/AVC specification, taking into account certain restrictions. Ourmeasurements have also illustrated the necessity to be careful with the usage of XPath expressions in a BSDL

    scheme because the latter can have a serious impact on the performance.

    Further research will be necessary in order to extend BSDL such that it can take into account the occuranceand appropriate insertion of NALU emulation prevention bytes, together with a study of the possible sideeffects that may occur when editing H.264/AVC bitstreams in the compressed domain. A scalable codingscheme should take into account the problems as just mentioned. It would also be very interesting if such ascheme would allow the full exploitation of scalability by only requiring knowledge about the high-level structureof the corresponding bitstream. The latter would make it much easier to bridge the gap to the power of theMPEG-21 tools. Other points of interest are a memory complexity analysis and the usage of metadata in orderto realize smart adaptations.

    564 Proc. of SPIE Vol. 5558

    Downloaded from SPIE Digital Library on 07 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    11/12

    APPENDIX A. AN H.264/AVC BITSTREAM SYNTAX DESCRIPTION IN BSDL

    Table 1. Description of the first seven syntax elements of a SPS in BSDL.

    880000210

    Table 2. Resulting output of the BinToBSD tool for the first seven syntax elements of a SPS.

    APPENDIX B. QUANTITATIVE RESULTS

    Full BSD Simplified BSDsli. pic. BinToBSD XSL BSDToBin BSDToBin BinToBSD XSL BSDToBin BSDToBin BinToBSD BinToBSD

    D (s) (ms) D (s) S (s) D (s) (ms) D (s) S (s) D - det (s) D - opt (s)1 21 8.7 296 0.7 0.9 1.1 234 0.5 0.5 1.1 0.5

    49 28.9 375 1.0 1.0 2.4 313 0.6 0.7 2.2 0.673 55.2 406 1.2 1.2 4.1 359 0.8 0.8 3.6 0.7

    99 92.4 468 1.4 1.5 6.2 375 1.0 1.0 5.5 0.9199 344.4 579 2.3 2.2 19.3 469 1.7 1.7 17.7 1.3299 770.8 609 3.0 2.9 41.9 500 2.3 2.3 37.7 1.7399 1371.8 672 3.3 3.8 70.2 500 2.8 3.0 62.9 2.2499 2214.3 734 4.6 4.4 104.6 562 3.5 3.5 94.6 2.5599 3408.9 859 5.3 5.1 145.2 625 4.1 4.1 131.8 2.8

    2 21 22.8 344 0.8 0.8 2.0 281 0.5 0.5 1.8 0.549 91.1 484 1.2 1.2 6.0 360 0.7 0.8 5.2 0.773 191.7 500 1.4 1.4 11.4 422 0.9 0.9 10.1 0.899 346.7 547 1.7 1.6 18.8 438 1.1 1.1 16.9 1.0

    199 1366.6 688 2.7 2.6 68.8 515 1.8 1.8 62.0 1.5299 3378.9 859 3.7 3.5 144.1 625 2.5 2.5 129.8 2.0399 6953.0 1016 4.6 4.4 241.9 688 3.3 3.2 218.6 2.5499 12851.1 1125 5.5 5.2 363.4 750 3.9 3.8 329.9 2.8599 21173.7 1265 6.4 6.0 512.0 796 4.5 4.5 463.5 3.3

    3 21 43.7 391 0.9 0.9 3.2 328 0.5 0.5 2.8 0.649 194.2 484 1.3 1.3 11.2 406 0.8 0.8 10.0 0.873 417.0 594 1.6 1.5 21.5 484 1.0 1.0 19.7 0.999 768.2 657 1.9 1.8 39.7 469 1.2 1.2 35.8 1.1

    199 3366.6 860 3.1 2.9 142.3 609 2.0 1.9 128.7 1.7299 9498.1 1047 4.2 3.9 298.3 719 2.8 2.8 270.6 2.2399 21072.4 1282 5.3 5.0 508.2 813 3.5 3.4 461.6 2.7499 40769.6 1454 6.4 6.1 772.1 922 4.3 4.2 704.9 3.2599 68041.2 1641 7.2 6.9 1095.9 985 5.1 4.9 991.8 3.8

    Table 3. Overview of the performance measurements: D denotes DOM, S denotes SAX, det denotes the deterministicsimplified scheme, while opt stands for the optimized version of the latter. The temporal downsampling resulted in a48.86% reduction of the bitstream size on the average (standard deviation: 0.89%), the latter being dependent on therate-distortion model used.

    Proc. of SPIE Vol. 5558 565

    Downloaded from SPIE Digital Library on 07 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms

  • 8/22/2019 Assessment of the compression efficiency of the MPEG-4 AVC specification

    12/12

    ACKNOWLEDGMENTS

    The authors would like to thank the developers of the MPEG-21 BSDL reference software for making availablethe required extensions. We would also like to thank Davy De Schrijver for the clarifying discussions about theusage of the HPjmeter profiling tool.

    The research activities that have been described in this paper were funded by Ghent University, the In-

    terdisciplinary Institute for Broadband Technology (IBBT), the Institute for the Promotion of Innovation byScience and Technology in Flanders (IWT), the Fund for Scientific Research-Flanders (FWO-Flanders), theBelgian Federal Science Policy Office (BFSPO), and the European Union.

    REFERENCES

    1. T. Wiegand, G. J. Sullivan, G. Bjntegaard, and A. Luthra, Overview of the H.264/AVC video codingstandard, IEEE Trans. Circuits Syst. Video Technol. 13, pp. 560576, July 2003.

    2. I. E. G. Richardson, H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multime-dia, John Wiley & Sons, LTD, 2003.

    3. Requirements for AVC Codec, MPEG-document ISO/IEC JTC1/SC29/WG11/N4672, Joint Video Teamof ISO/IEC JTC1/SC29/WG11 and ITU-T SG16/Q.6, Jeju, Korea, Mar. 2002. Available on http://www.chiariglione.org/mpeg/working documents.

    4. H. Schwarz, D. Marpe, and T. Wiegand, Subband Extension of H.264/AVC, JVT-document JVT-K023,Munich, Germany, Joint Video Team of ISO/IEC JTC1/SC29/WG11 and ITU-T SG16/Q.6, Mar. 2004.

    5. Requirements and Applications for Scalable Video Coding, MPEG-document ISO/IECJTC1/SC29/WG11 N6025, Moving Picture Experts Group (MPEG), Gold Coast, Australia, Mar.2003. Available on http://www.chiariglione.org/mpeg/working documents.

    6. M. Amielh and S. Devillers, Bitstream Syntax Description Language: Application of XML-Schema to Mul-timedia Content Adaptation, in WWW2002: The Eleventh International World Wide Web Conference,(Honolulu, Hawaii), May 2002. Available on http://www2002.org/CDROM/alternate/334/.

    7. I. Burnett, R. V. de Walle, K. Hill, J. Bormans, and F. Pereira, MPEG-21: Goals and achievements,IEEE Multimedia 10, pp. 6070, October-December 2003.

    8. Multimedia Framework Part 7: Digital Item Adaptation, Final Draft International Standard, MPEG-document ISO/IEC JTC1/SC29/WG11/N6167, Moving Picture Experts Group (MPEG), Waikaloa, USA,

    Dec. 2003.9. D. C. Fallside, XML Schema Part 0: Primer, recommendation, World Wide Web Consortium (W3C),

    http://www.w3c.org/TR/xmlschema-0/, May 2001.

    10. H. S. Thompson, D. Beech, M. Maloney, and N. Mendelsohn, XML Schema Part 1: Structures, recom-mendation, World Wide Web Consortium (W3C), http://www.w3c.org/TR/xmlschema-1/, May 2001.

    11. P. V. Biron and A. Malhotra, XML Schema Part 1: Datatypes, recommendation, World Wide WebConsortium (W3C), http://www.w3c.org/TR/xmlschema-2/, May 2001.

    12. J. Magalhaes and F. Pereira, Using MPEG standards for multimedia customization, IEEE Trans. CircuitsSyst. Video Technol. 19, pp. 437456, May 2004.

    13. S. Lerouge, P. Lambert, and R. Van de Walle, Multi-criteria optimization for scalable bitstreams, inProceedings of the 8th International Workshop on Visual Content Processing and Representation, pp. 122130, Springer, (Madrid), September 2003.

    14. W. De Neve, F. De Keukelaere, K. De Wolf, and R. Van de Walle, Applying MPEG-21 BSDL to the JVTH.264/AVC Specification in MPEG-21 Session Mobility Scenarios, in Proceedings of the 5th InternationalWorkshop on Image Analysis for Multimedia Interactive Services, p. 4 pp, (Lisboa), April 2004.

    15. J. Clark and S. DeRose, XML Path Language 1.0, recommendation, World Wide Web Consortium(W3C), http://www.w3c.org/TR/xpath, Nov. 1999.

    16. JVT H.264/AVC Reference Software. http://bs.hhi.de/suehring/tml/download/.

    17. HPjmeter. http://www.hp.com/products1/unix/java/hpjmeter/.

    18. The Document Table Model. http://xml.apache.org/xalan-j/dtm.html.

    566 Proc. of SPIE Vol. 5558